Re: [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering

2019-06-25 Thread Sandeep Patil
On Tue, Jun 25, 2019 at 11:53:13AM +0800, Greg Kroah-Hartman wrote:
> On Mon, Jun 24, 2019 at 03:37:07PM -0700, Sandeep Patil wrote:
> > We are trying to make sure that all (most) drivers in an Aarch64 system can
> > be kernel modules for Android, like any other desktop system for
> > example. There are a number of problems we need to fix before that happens
> > ofcourse.
> 
> I will argue that this is NOT an android-specific issue.  If the goal of
> creating an arm64 kernel that will "just work" for a wide range of
> hardware configurations without rebuilding is going to happen, we need
> to solve this problem with DT.  This goal was one of the original wishes
> of the arm64 development effort, let's not loose sight of it as
> obviously, this is not working properly just yet.

I believe the proposed solution in this patch series is just that. I am not
sure what the alternatives are. The alternative suggested was to reuse
pre-existing dt-bindings for dependency based probe re-ordering and resolution.

However, it seems we had no way to *really* check if these dependencies are
the real. So, a device may or may not actually depend on the other device for
probe / initialization when the dependency is mentioned in it's dt node. From
DT's point of view, there is no way to tell this ..

I don't know how this is handled in x86. With DT, I don't see how we can do
this unless DT dependencies are _really_ tied with runtime dependencies (The
cycles would have been apparent if that was the case.

Honestly, the "depends-on" property suggested here just piles on to the
existing state. So, it is somewhat doubling the exiting bindings. It says,
you must use depends-on property to define probe / initialization dependency.
The existing bindings like 'clock', 'interrupt', '*-supply' do not enforce
that right now, so you will have device nodes that have these bindings right
now but don't necessarily need them for successful probe for example.

> 
> It just seems that Android is the first one to actually try and
> implement that goal :)

I guess :)

- ssp

> 
> thanks,
> 
> greg k-h


Re: [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering

2019-06-24 Thread Sandeep Patil
(Responding to the first email in the series to summarize the current
situation and choices we have.)

On Mon, Jun 03, 2019 at 05:32:13PM -0700, 'Saravana Kannan' via kernel-team 
wrote:
> Add a generic "depends-on" property that allows specifying mandatory
> functional dependencies between devices. Add device-links after the
> devices are created (but before they are probed) by looking at this
> "depends-on" property.
> 
> This property is used instead of existing DT properties that specify
> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> is because not all resources referred to by existing DT properties are
> mandatory functional dependencies. Some devices/drivers might be able
> to operate with reduced functionality when some of the resources
> aren't available. For example, a device could operate in polling mode
> if no IRQ is available, a device could skip doing power management if
> clock or voltage control isn't available and they are left on, etc.
> 
> So, adding mandatory functional dependency links between devices by
> looking at referred phandles in DT properties won't work as it would
> prevent probing devices that could be probed. By having an explicit
> depends-on property, we can handle these cases correctly.
> 
> Having functional dependencies explicitly called out in DT and
> automatically added before the devices are probed, provides the
> following benefits:
> 
> - Optimizes device probe order and avoids the useless work of
>   attempting probes of devices that will not probe successfully
>   (because their suppliers aren't present or haven't probed yet).
> 
>   For example, in a commonly available mobile SoC, registering just
>   one consumer device's driver at an initcall level earlier than the
>   supplier device's driver causes 11 failed probe attempts before the
>   consumer device probes successfully. This was with a kernel with all
>   the drivers statically compiled in. This problem gets a lot worse if
>   all the drivers are loaded as modules without direct symbol
>   dependencies.
> 
> - Supplier devices like clock providers, regulators providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.
> 
>   Some frameworks (Eg: regulator) handle this today by turning off
>   "unused" resources at late_initcall_sync and hoping all the devices
>   have probed by then. This is not a valid assumption for systems with
>   loadable modules. Other frameworks (Eg: clock) just don't handle
>   this due to the lack of a clear signal for when they can turn off
>   resources. This leads to downstream hacks to handle cases like this
>   that can easily be solved in the upstream kernel.
> 
>   By linking devices before they are probed, we give suppliers a clear
>   count of the number of dependent consumers. Once all of the
>   consumers are active, the suppliers can turn off the unused
>   resources without making assumptions about the number of consumers.
> 
> By default we just add device-links to track "driver presence" (probe
> succeeded) of the supplier device. If any other functionality provided
> by device-links are needed, it is left to the consumer/supplier
> devices to change the link when they probe.
>  

We are trying to make sure that all (most) drivers in an Aarch64 system can
be kernel modules for Android, like any other desktop system for
example. There are a number of problems we need to fix before that happens
ofcourse.

That patch series does the following -

1. Resolve the consumer<->supplier relationship in order for subsystems to
turn off resources to save power is #1 on our list. This is because it will
define how driver and DT writers define and write their code for Android
devices.

2. Resolve cyclic dependency between devices nodes as a side-effect.  That
will make sure Android systems do not suffer from deferred probing and we
have a generic way of serialize driver probes. (This can also be mitigated
somewhat by module dependencies outside of the kernel.)

Subsystems like regulator, interconnect can immediately benefit from #1 and
it makes the module/driver loading possible for Android systems without
regressing power horribly.

After thinking about Rob's suggestion to loop though dependencies, I think it
doesn't work because we don't know what to do when there are cycles. Drivers
sometimes need to have access to phandles in order to retrieve resources from
that phandle. We can't assume that is an implied "probe dependency". Saravana
had several such examples already and it is a _real world_ scenario.

I think the current binding are insufficient to describe the runtime hardware
dependencies. IOW, the current bindi

Re: Alternatives to /sys/kernel/debug/wakeup_sources

2019-06-19 Thread Sandeep Patil
On Wed, Jun 19, 2019 at 10:35:17AM +0200, Rafael J. Wysocki wrote:
> On Wed, Jun 19, 2019 at 1:52 AM Joel Fernandes  wrote:
> >
> > On Tue, Jun 18, 2019 at 7:15 PM Tri Vo  wrote:
> > [snip]
> > > > > > >
> > > > > > > Android userspace reading wakeup_sources is not ideal because:
> > > > > > > - Debugfs API is not stable, i.e. Android tools built on top of 
> > > > > > > it are
> > > > > > > not guaranteed to be backward/forward compatible.
> > > > > > > - This file requires debugfs to be mounted, which itself is
> > > > > > > undesirable for security reasons.
> > > > > > >
> > > > > > > To address these problems, we want to contribute a way to expose 
> > > > > > > these
> > > > > > > statistics that doesn't depend on debugfs.
> > > > > > >
> > > > > > > Some initial thoughts/questions: Should we expose the stats in 
> > > > > > > sysfs?
> > > > > > > Or maybe implement eBPF-based solution? What do you think?
> > > > >
> > > > > We are going through Android's out-of-tree kernel dependencies along 
> > > > > with
> > > > > userspace APIs that are not necessarily considered "stable and forever
> > > > > supported" upstream. The debugfs dependencies showed up on our radar 
> > > > > as a
> > > > > result and so we are wondering if we should worry about changes in 
> > > > > debugfs
> > > > > interface and hence the question(s) below.
> > > > >
> > > > > So, can we rely on /d/wakeup_sources to be considered a userspace API 
> > > > > and
> > > > > hence maintained stable as we do for other /proc and /sys entries?
> > > > >
> > > > > If yes, then we will go ahead and add tests for this in LTP or
> > > > > somewhere else suitable.
> > > >
> > > > No, debugfs is not ABI.
> > > >
> > > > > If no, then we would love to hear suggestions for any changes that 
> > > > > need to be
> > > > > made or we simply just move the debugfs entry into somewhere like
> > > > > /sys/power/ ?
> > > >
> > > > No, moving that entire file from debugfs into sysfs is not an option 
> > > > either.
> > > >
> > > > The statistics for the wakeup sources associated with devices are 
> > > > already there
> > > > under /sys/devices/.../power/ , but I guess you want all wakeup sources?
> > > >
> > > > That would require adding a kobject to struct wakeup_source and exposing
> > > > all of the statistics as separate attributes under it.  In which case 
> > > > it would be
> > > > good to replace the existing wakeup statistics under 
> > > > /sys/devices/.../power/
> > > > with symbolic links to the attributes under the wakeup_source kobject.
> > >
> > > Thanks for your input, Rafael! Your suggestion makes sense. I'll work
> > > on a patch for this.
> >
> > Does that entail making each wake up source, a new sysfs node under a
> > particular device, and then adding stats under that new node?
> 
> Not under a device, because there are wakeup source objects without
> associated devices.
> 
> It is conceivable to have a "wakeup_sources" directory under
> /sys/power/ and sysfs nodes for all wakeup sources in there.


This is what I understood from your initial reply and I think it makes sense.
Thanks again, Rafael.

- ssp

> 
> Then, instead of exposing wakeup statistics directly under
> /sys/devices/.../power/, there can be symbolic links from there to the
> new wakeup source nodes under "wakeup_sources" (so as to avoid
> exposing the same data in two different places in sysfs, which may be
> confusing).
> 
> Cheers!


Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()

2019-06-18 Thread Sandeep Patil
On Wed, Jun 12, 2019 at 12:29:18PM -0700, 'Saravana Kannan' via kernel-team 
wrote:
> On Wed, Jun 12, 2019 at 11:19 AM Rob Herring  wrote:
> >
> > On Wed, Jun 12, 2019 at 11:08 AM Greg Kroah-Hartman
> >  wrote:
> > >
> > > On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
> > > > On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
> > > >  wrote:
> > > > >
> > > > > On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > > > > > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via 
> > > > > > > kernel-team wrote:
> > > > > > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Hi Saravana,
> > > > > > > > >
> > > > > > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > > > > > Why are you resending this rather than replying to Frank's 
> > > > > > > > > > last
> > > > > > > > > > comments on the original?
> > > > > > > > >
> > > > > > > > > Adding on a different aspect...  The independent replies from 
> > > > > > > > > three different
> > > > > > > > > maintainers (Rob, Mark, myself) pointed out architectural 
> > > > > > > > > issues with the
> > > > > > > > > patch series.  There were also some implementation issues 
> > > > > > > > > brought out.
> > > > > > > > > (Although I refrained from bringing up most of my 
> > > > > > > > > implementation issues
> > > > > > > > > as they are not relevant until architecture issues are 
> > > > > > > > > resolved.)
> > > > > > > >
> > > > > > > > Right, I'm not too worried about the implementation issues 
> > > > > > > > before we
> > > > > > > > settle on the architectural issues. Those are easy to fix.
> > > > > > > >
> > > > > > > > Honestly, the main points that the maintainers raised are:
> > > > > > > > 1) This is a configuration property and not describing the 
> > > > > > > > device.
> > > > > > > > Just use the implicit dependencies coming from existing 
> > > > > > > > bindings.
> > > > > > > >
> > > > > > > > I gave a bunch of reasons for why I think it isn't an OS 
> > > > > > > > configuration
> > > > > > > > property. But even if that's not something the maintainers can 
> > > > > > > > agree
> > > > > > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > > > > > provider hardware) where the implicit dependencies would 
> > > > > > > > prevent one
> > > > > > > > of the devices from probing till the end of time. So even if the
> > > > > > > > maintainers don't agree we should always look at "depends-on" to
> > > > > > > > decide the dependencies, we still need some means to override 
> > > > > > > > the
> > > > > > > > implicit dependencies where they don't match the real 
> > > > > > > > dependency. Can
> > > > > > > > we use depends-on as an override when the implicit dependencies 
> > > > > > > > aren't
> > > > > > > > correct?
> > > > > > > >
> > > > > > > > 2) This doesn't need to be solved because this is just 
> > > > > > > > optimizing
> > > > > > > > probing or saving power ("we should get rid of this auto 
> > > > > > > > disabling"):
> > > > > > > >
> > > > > > > > I explained why this patch series is not just about optimizing 
> > > > > > > > probe
> > > > > > > > ordering or

Re: Alternatives to /sys/kernel/debug/wakeup_sources

2019-06-18 Thread Sandeep Patil


Hi Rafael, Viresh etc.

On Tue, Jun 11, 2019 at 10:31:16AM -0700, Tri Vo wrote:
> On Tue, Jun 4, 2019 at 5:23 PM Tri Vo  wrote:
> >
> > Hello Rafael,
> >
> > Currently, Android reads wakeup sources statistics from
> > /sys/kernel/debug/wakeup_sources in production environment. This
> > information is used, for example, to report which wake lock prevents
> > the device from suspending.

Android's usage of the 'wakeup_sources' from debugfs can is linked at[1].
Basically, android's battery stats implementation to plot history for suspend
blocking wakeup sources over device's boot cycle. This is used both for power
specific bug reporting but also is one of the stats that will be used towards
attributing the battery consumption to specific processes over the period of
time.

Android depended on the out-of-tree /proc/wakelocks before and now relies on
wakeup_sources debugfs entry heavily for the aforementioned use cases.

> >
> > Android userspace reading wakeup_sources is not ideal because:
> > - Debugfs API is not stable, i.e. Android tools built on top of it are
> > not guaranteed to be backward/forward compatible.
> > - This file requires debugfs to be mounted, which itself is
> > undesirable for security reasons.
> >
> > To address these problems, we want to contribute a way to expose these
> > statistics that doesn't depend on debugfs.
> >
> > Some initial thoughts/questions: Should we expose the stats in sysfs?
> > Or maybe implement eBPF-based solution? What do you think?

We are going through Android's out-of-tree kernel dependencies along with
userspace APIs that are not necessarily considered "stable and forever
supported" upstream. The debugfs dependencies showed up on our radar as a
result and so we are wondering if we should worry about changes in debugfs
interface and hence the question(s) below.

So, can we rely on /d/wakeup_sources to be considered a userspace API and
hence maintained stable as we do for other /proc and /sys entries?

If yes, then we will go ahead and add tests for this in LTP or
somewhere else suitable.

If no, then we would love to hear suggestions for any changes that need to be
made or we simply just move the debugfs entry into somewhere like
/sys/power/ ?

As a side effect, if the entry moves out of debugfs, Android can run without
mounting debugfs in production that I assume is a good thing.

- ssp

1. 
https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/core/java/com/android/internal/os/KernelWakelockReader.java#127


Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()

2019-06-11 Thread Sandeep Patil
On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team 
wrote:
> On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand  wrote:
> >
> > Hi Saravana,
> >
> > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > Why are you resending this rather than replying to Frank's last
> > > comments on the original?
> >
> > Adding on a different aspect...  The independent replies from three 
> > different
> > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > patch series.  There were also some implementation issues brought out.
> > (Although I refrained from bringing up most of my implementation issues
> > as they are not relevant until architecture issues are resolved.)
> 
> Right, I'm not too worried about the implementation issues before we
> settle on the architectural issues. Those are easy to fix.
> 
> Honestly, the main points that the maintainers raised are:
> 1) This is a configuration property and not describing the device.
> Just use the implicit dependencies coming from existing bindings.
> 
> I gave a bunch of reasons for why I think it isn't an OS configuration
> property. But even if that's not something the maintainers can agree
> to, I gave a concrete example (cyclic dependencies between clock
> provider hardware) where the implicit dependencies would prevent one
> of the devices from probing till the end of time. So even if the
> maintainers don't agree we should always look at "depends-on" to
> decide the dependencies, we still need some means to override the
> implicit dependencies where they don't match the real dependency. Can
> we use depends-on as an override when the implicit dependencies aren't
> correct?
> 
> 2) This doesn't need to be solved because this is just optimizing
> probing or saving power ("we should get rid of this auto disabling"):
> 
> I explained why this patch series is not just about optimizing probe
> ordering or saving power. And why we can't ignore auto disabling
> (because it's more than just auto disabling). The kernel is currently
> broken when trying to use modules in ARM SoCs (probably in other
> systems/archs too, but I can't speak for those).
> 
> 3) Concerns about backwards compatibility
> 
> I pointed out why the current scheme (depends-on being the only source
> of dependency) doesn't break compatibility. And if we go with
> "depends-on" as an override what we could do to keep backwards
> compatibility. Happy to hear more thoughts or discuss options.
> 
> 4) How the "sync_state" would work for a device that supplies multiple
> functionalities but a limited driver.


To be clear, all of above are _real_ problems that stops us from efficiently
load device drivers as modules for Android.

So, if 'depends-on' doesn't seem like the right approach and "going back to
the drawing board" is the ask, could you please point us in the right
direction?

- ssp


Re: [RFC 0/2] opportunistic memory reclaim of a killed process

2019-04-11 Thread Sandeep Patil
On Thu, Apr 11, 2019 at 12:51:11PM +0200, Michal Hocko wrote:
> On Wed 10-04-19 18:43:51, Suren Baghdasaryan wrote:
> [...]
> > Proposed solution uses existing oom-reaper thread to increase memory
> > reclaim rate of a killed process and to make this rate more deterministic.
> > By no means the proposed solution is considered the best and was chosen
> > because it was simple to implement and allowed for test data collection.
> > The downside of this solution is that it requires additional “expedite”
> > hint for something which has to be fast in all cases. Would be great to
> > find a way that does not require additional hints.
> 
> I have to say I do not like this much. It is abusing an implementation
> detail of the OOM implementation and makes it an official API. Also
> there are some non trivial assumptions to be fullfilled to use the
> current oom_reaper. First of all all the process groups that share the
> address space have to be killed. How do you want to guarantee/implement
> that with a simply kill to a thread/process group?
> 
> > Other possible approaches include:
> > - Implementing a dedicated syscall to perform opportunistic reclaim in the
> > context of the process waiting for the victim’s death. A natural boost
> > bonus occurs if the waiting process has high or RT priority and is not
> > limited by cpuset cgroup in its CPU choices.
> > - Implement a mechanism that would perform opportunistic reclaim if it’s
> > possible unconditionally (similar to checks in task_will_free_mem()).
> > - Implement opportunistic reclaim that uses shrinker interface, PSI or
> > other memory pressure indications as a hint to engage.
> 
> I would question whether we really need this at all? Relying on the exit
> speed sounds like a fundamental design problem of anything that relies
> on it.

OTOH, we want to keep as many processes around as possible for recency. In which
case, the exit path (particularly the memory reclaim) becomes critical to
maintain interactivity for phones.

Android keeps processes around because cold starting applications is much
slower than simply bringing them up from background. This obviously presents
the problem when a background application _is_ killed, it is almost always to
address sudden spike in memory needs by something else much more important
and user visible. e.g. a foreground application or critical system process.

> Sure task exit might be slow, but async mm tear down is just a
> mere optimization this is not guaranteed to really help in speading
> things up. OOM killer uses it as a guarantee for a forward progress in a
> finite time rather than as soon as possible.

With OOM killer, things are already really bad. When lmkd[1] kills processes,
it is doing so to serve the immediate needs of the system while trying to
avoid the OOM killer.


- ssp

1] 
https://android.googlesource.com/platform/system/core/+/refs/heads/master/lmkd/


[PATCH v2] mm: proc: smaps_rollup: Fix pss_locked calculation

2019-02-02 Thread Sandeep Patil
The 'pss_locked' field of smaps_rollup was being calculated incorrectly.
It accumulated the current pss everytime a locked VMA was found.  Fix
that by adding to 'pss_locked' the same time as that of 'pss' if the vma
being walked is locked.

Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Cc: sta...@vger.kernel.org # 4.14.y 4.19.y
Signed-off-by: Sandeep Patil 
---

v1->v2
--
- Move pss_locked accounting into smaps_account() inline with pss

 fs/proc/task_mmu.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f0ec9edab2f3..85b0ef890b28 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -423,7 +423,7 @@ struct mem_size_stats {
 };
 
 static void smaps_account(struct mem_size_stats *mss, struct page *page,
-   bool compound, bool young, bool dirty)
+   bool compound, bool young, bool dirty, bool locked)
 {
int i, nr = compound ? 1 << compound_order(page) : 1;
unsigned long size = nr * PAGE_SIZE;
@@ -450,24 +450,31 @@ static void smaps_account(struct mem_size_stats *mss, 
struct page *page,
else
mss->private_clean += size;
mss->pss += (u64)size << PSS_SHIFT;
+   if (locked)
+   mss->pss_locked += (u64)size << PSS_SHIFT;
return;
}
 
for (i = 0; i < nr; i++, page++) {
int mapcount = page_mapcount(page);
+   unsigned long pss = (PAGE_SIZE << PSS_SHIFT);
 
if (mapcount >= 2) {
if (dirty || PageDirty(page))
mss->shared_dirty += PAGE_SIZE;
else
mss->shared_clean += PAGE_SIZE;
-   mss->pss += (PAGE_SIZE << PSS_SHIFT) / mapcount;
+   mss->pss += pss / mapcount;
+   if (locked)
+   mss->pss_locked += pss / mapcount;
} else {
if (dirty || PageDirty(page))
mss->private_dirty += PAGE_SIZE;
else
mss->private_clean += PAGE_SIZE;
-   mss->pss += PAGE_SIZE << PSS_SHIFT;
+   mss->pss += pss;
+   if (locked)
+   mss->pss_locked += pss;
}
}
 }
@@ -490,6 +497,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 {
struct mem_size_stats *mss = walk->private;
struct vm_area_struct *vma = walk->vma;
+   bool locked = !!(vma->vm_flags & VM_LOCKED);
struct page *page = NULL;
 
if (pte_present(*pte)) {
@@ -532,7 +540,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
if (!page)
return;
 
-   smaps_account(mss, page, false, pte_young(*pte), pte_dirty(*pte));
+   smaps_account(mss, page, false, pte_young(*pte), pte_dirty(*pte), 
locked);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -541,6 +549,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
 {
struct mem_size_stats *mss = walk->private;
struct vm_area_struct *vma = walk->vma;
+   bool locked = !!(vma->vm_flags & VM_LOCKED);
struct page *page;
 
/* FOLL_DUMP will return -EFAULT on huge zero page */
@@ -555,7 +564,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
/* pass */;
else
VM_BUG_ON_PAGE(1, page);
-   smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd));
+   smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd), 
locked);
 }
 #else
 static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
@@ -737,11 +746,8 @@ static void smap_gather_stats(struct vm_area_struct *vma,
}
}
 #endif
-
/* mmap_sem is held in m_start */
walk_page_vma(vma, &smaps_walk);
-   if (vma->vm_flags & VM_LOCKED)
-   mss->pss_locked += mss->pss;
 }
 
 #define SEQ_PUT_DEC(str, val) \
-- 
2.20.1.611.gfbb209baf1-goog



Re: [PATCH] mm: proc: smaps_rollup: Fix pss_locked calculation

2019-02-02 Thread Sandeep Patil
On Tue, Jan 29, 2019 at 04:52:21PM +0100, Vlastimil Babka wrote:
> On 1/29/19 1:15 AM, Andrew Morton wrote:
> > On Sun, 20 Jan 2019 17:10:49 -0800 Sandeep Patil  
> > wrote:
> > 
> >> The 'pss_locked' field of smaps_rollup was being calculated incorrectly
> >> as it accumulated the current pss everytime a locked VMA was found.
> >> 
> >> Fix that by making sure we record the current pss value before each VMA
> >> is walked. So, we can only add the delta if the VMA was found to be
> >> VM_LOCKED.
> >> 
> >> ...
> >>
> >> --- a/fs/proc/task_mmu.c
> >> +++ b/fs/proc/task_mmu.c
> >> @@ -709,6 +709,7 @@ static void smap_gather_stats(struct vm_area_struct 
> >> *vma,
> >>  #endif
> >>.mm = vma->vm_mm,
> >>};
> >> +  unsigned long pss;
> >>  
> >>smaps_walk.private = mss;
> >>  
> >> @@ -737,11 +738,12 @@ static void smap_gather_stats(struct vm_area_struct 
> >> *vma,
> >>}
> >>}
> >>  #endif
> >> -
> >> +  /* record current pss so we can calculate the delta after page walk */
> >> +  pss = mss->pss;
> >>/* mmap_sem is held in m_start */
> >>walk_page_vma(vma, &smaps_walk);
> >>if (vma->vm_flags & VM_LOCKED)
> >> -  mss->pss_locked += mss->pss;
> >> +  mss->pss_locked += mss->pss - pss;
> >>  }
> > 
> > This seems to be a rather obscure way of accumulating
> > mem_size_stats.pss_locked.  Wouldn't it make more sense to do this in
> > smaps_account(), wherever we increment mem_size_stats.pss?
> > 
> > It would be a tiny bit less efficient but I think that the code cleanup
> > justifies such a cost?
> 
> Yeah, Sandeep could you add 'bool locked' param to smaps_account() and check 
> it
> there? We probably don't need the whole vma param yet.

Agree, I will send -v2 shortly.

- ssp


[PATCH] mm: proc: smaps_rollup: Fix pss_locked calculation

2019-01-21 Thread Sandeep Patil
The 'pss_locked' field of smaps_rollup was being calculated incorrectly
as it accumulated the current pss everytime a locked VMA was found.

Fix that by making sure we record the current pss value before each VMA
is walked. So, we can only add the delta if the VMA was found to be
VM_LOCKED.

Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Cc: sta...@vger.kernel.org # 4.14.y 4.19.y
Signed-off-by: Sandeep Patil 
---
 fs/proc/task_mmu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f0ec9edab2f3..51a00a2b4733 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -709,6 +709,7 @@ static void smap_gather_stats(struct vm_area_struct *vma,
 #endif
.mm = vma->vm_mm,
};
+   unsigned long pss;
 
smaps_walk.private = mss;
 
@@ -737,11 +738,12 @@ static void smap_gather_stats(struct vm_area_struct *vma,
}
}
 #endif
-
+   /* record current pss so we can calculate the delta after page walk */
+   pss = mss->pss;
/* mmap_sem is held in m_start */
walk_page_vma(vma, &smaps_walk);
if (vma->vm_flags & VM_LOCKED)
-   mss->pss_locked += mss->pss;
+   mss->pss_locked += mss->pss - pss;
 }
 
 #define SEQ_PUT_DEC(str, val) \
-- 
2.20.1.321.g9e740568ce-goog



Re: [RFC] Provide in-kernel headers for making it easy to extend the kernel

2019-01-20 Thread Sandeep Patil
On Sun, Jan 20, 2019 at 06:49:56PM -0800, h...@zytor.com wrote:
> On January 20, 2019 5:45:53 PM PST, Joel Fernandes  
> wrote:
> >On Sun, Jan 20, 2019 at 01:58:15PM -0800, h...@zytor.com wrote:
> >> On January 20, 2019 8:10:03 AM PST, Joel Fernandes
> > wrote:
> >> >On Sat, Jan 19, 2019 at 11:01:13PM -0800, h...@zytor.com wrote:
> >> >> On January 19, 2019 2:36:06 AM PST, Greg KH
> >> > wrote:
> >> >> >On Sat, Jan 19, 2019 at 02:28:00AM -0800, Christoph Hellwig
> >wrote:
> >> >> >> This seems like a pretty horrible idea and waste of kernel
> >memory.
> >> >> >
> >> >> >It's only a waste if you want it to be a waste, i.e. if you load
> >the
> >> >> >kernel module.
> >> >> >
> >> >> >This really isn't any different from how /proc/config.gz works.
> >> >> >
> >> >> >> Just add support to kbuild to store a compressed archive in
> >> >initramfs
> >> >> >> and unpack it in the right place.
> >> >> >
> >> >> >I think the issue is that some devices do not use initramfs, or
> >> >switch
> >> >> >away from it after init happens or something like that.  Joel has
> >> >all
> >> >> >of
> >> >> >the looney details that he can provide.
> >> >> >
> >> >> >thanks,
> >> >> >
> >> >> >greg k-h
> >> >> 
> >> >> Yeah, well... but it is kind of a losing game... the more
> >in-kernel
> >> >stuff there is the less smiley are things to actually be supported.
> >> >
> >> >It is better than nothing, and if this makes things a bit easier and
> >> >solves
> >> >real-world issues people have been having, and is optional, then I
> >> >don't see
> >> >why not.
> >> >
> >> >> Modularizing is it in some ways even crazier in the sense that at
> >> >that point you are relying on the filesystem containing the module,
> >> >which has to be loaded into the kernel by a root user. One could
> >even
> >> >wonder if a better way to do this would be to have "make
> >> >modules_install" park an archive file – or even a directory as
> >opposed
> >> >to a symlink – with this stuff in /lib/modules. We could even
> >provide a
> >> >tmpfs shim which autoloads such an archive via the firmware loader;
> >> >this might even be generically useful, who knows.
> >> >
> >> >All this seems to assume where the modules are located. In Android,
> >we
> >> >don't
> >> >have /lib/modules. This patch generically fits into the grand scheme
> >> >things
> >> >and I think is just better made a part of the kernel since it is not
> >> >that
> >> >huge once compressed, as Dan also pointed. The more complex, and the
> >> >more
> >> >assumptions we make, the less likely people writing tools will get
> >it
> >> >right
> >> >and be able to easily use it.
> >> >
> >> >> 
> >> >> Note also that initramfs contents can be built into the kernel.
> >> >Extracting such content into a single-instance tmpfs would again be
> >a
> >> >possibility
> >> >
> >> >Such an approach would bloat the kernel image size though, which may
> >> >not work
> >> >for everyone. The module based approach, on the other hand, gives an
> >> >option
> >> >to the user to enable the feature, but not have it loaded into
> >memory
> >> >or used
> >> >until it is really needed.
> >> >
> >> >thanks,
> >> >
> >> > - Joel
> >> 
> >> Well, where are the modules? They must exist in the filesystem.
> >
> >The scheme of loading a module doesn't depend on _where_ the module is
> >on the
> >filesystem. As long as a distro knows how to load a module in its own
> >way (by
> >looking into whichever paths it cares about), that's all that matters. 
> >And
> >the module contains compressed headers which saves space, vs storing it
> >uncompressed on the file system.
> >
> >To remove complete reliance on the filesystem, there is an option of
> >not
> >building it as a module, and making it as a built-in.
> >
> >I think I see your point now - you're saying if its built-in, then it
> >becomes kernel memory that is lost and unswappable. Did I get that
> >right?
> >I am saying that if that's a major concern, then:
> >1. Don't make it a built-in, make it a module.
> >2. Don't enable it at for your distro, and use a linux-headers package
> >or
> >whatever else you have been using so far that works for you.
> >
> >thanks,
> >
> > - Joel
> 
> My point is that if we're going to actually solve a problem, we need to make 
> it so that the distro won't just disable it anyway, and it ought to be 
> something scalable; otherwise nothing is gained.
> 
> I am *not* disagreeing with the problem statement!
> 
> Now, /proc isn't something that will autoload modules. A filesystem *will*, 
> although you need to be able to mount it; furthermore, it makes it trivially 
> to extend it (and the firmware interface provides an . easy way to feed the 
> data to such a filesystem without having to muck with anything magic.)
> 
> Heck, we could even make it a squashfs image that can just be mounted.
> 
> So, first of all, where does Android keep its modules, and what is actually 
> included? Is /sbin/modprobe used to load the modules, as is normal? We might 
>

Re: [PATCH] tracing: do not leak kernel addresses

2018-07-30 Thread Sandeep Patil
On Fri, Jul 27, 2018 at 08:04:28PM -0400, Theodore Y. Ts'o wrote:
> On Fri, Jul 27, 2018 at 03:05:43PM -0700, Sandeep Patil wrote:
> > On Fri, Jul 27, 2018 at 04:21:14PM -0400, Theodore Y. Ts'o wrote:
> > > On Fri, Jul 27, 2018 at 04:11:03PM -0400, Steven Rostedt wrote:
> > > > That said, I would assume that
> > > > other Android utilities are using other debugfs files for system
> > > > status and such.
> > 
> > As of today, I think a lot of information in 'bugreports' is read
> > out of debugfs (including things like binder stats). We do have a plan
> > to change that.
> 
> Hmm, if it's only for bugreports, maybe it can be only mounted when
> about root processes getting tricked into reading from debugfs.

Yes, that's an interesting idea. May be a quicker way to get ourselves
rid of relying on debugfs. We need some platform cleanup to remove
all debugfs accessing code that's not "debug only" first. That work has been
ongoing ..

> 
> 
> > Indeed, I think it can. However, the problem is the last time I tried to
> > remove this a whole bunch of things just broke. So, it wasn't about losing
> > a functionality here and there. Agree, we need to clean up platform to not 
> > use
> > debugfs first. Then we can expect Apps or other native processes to not rely
> > on debugfs at all.
> 
> Is Android controlling access to debugfs files via SELinux?  If so,
> then access to debugfs can be gradually cranked down as use cases are
> removed.

Yes, that's what we've done now, so we know where the code is that depends on
it and working on moving it out. New domains aren't allowed to rely on
debugfs now.

- ssp
> 
>   - Ted
> 
> 
>  


Re: [PATCH] tracing: do not leak kernel addresses

2018-07-27 Thread Sandeep Patil
On Fri, Jul 27, 2018 at 04:21:14PM -0400, Theodore Y. Ts'o wrote:
> On Fri, Jul 27, 2018 at 04:11:03PM -0400, Steven Rostedt wrote:
> > That said, I would assume that
> > other Android utilities are using other debugfs files for system
> > status and such.

As of today, I think a lot of information in 'bugreports' is read
out of debugfs (including things like binder stats). We do have a plan
to change that.

> 
> Yeah, I know we probably have lost the "debugfs is only for debugging
> and has no place in a production system" battle, and we should just
> move on and assume we need to completely harden all of debugfs.  But
> it's worth at least *asking* whether or not the use of debugfs for
> Android can be avoided

Indeed, I think it can. However, the problem is the last time I tried to
remove this a whole bunch of things just broke. So, it wasn't about losing
a functionality here and there. Agree, we need to clean up platform to not use
debugfs first. Then we can expect Apps or other native processes to not rely
on debugfs at all.

The work is in progress..[1]

- ssp

1] 
https://source.android.com/devices/architecture/kernel/modular-kernels#debugfs

> 
>   - Ted
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "kernel-team" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
> 


Re: [PATCH v5 2/2] staging: ion: create one device entry per heap

2017-10-03 Thread Sandeep Patil
On Tue, Oct 03, 2017 at 02:42:32PM -0700, Laura Abbott wrote:
> On 10/03/2017 09:48 AM, Mark Brown wrote:
> > On Mon, Oct 02, 2017 at 11:07:48AM -0700, Laura Abbott wrote:
> > 
> >> Thinking about this a bit more, I'm not 100% sure if this
> >> will allow the security rules we want. Heap ids are assigned
> >> dynamically and therefore so will the /dev/ionX designation.
> >> From my understanding, security rules like selinux need to
> >> be fully specified at boot time so I'm not sure how you would
> >> be able to write rules to differentiate between /dev/ionX and
> >> /dev/ionY without knowing the values at boottime.
> > 
> > Isn't this something that should be managable via udev rules that ensure
> > stable names in the same way as for things like disks or ethernet
> > controllers (even if it just ends up doing something like /dev/ion-gpu
> > or whatever)?  If we're not giving it enough information to assign stable
> > names where needed we probably need to fix that anyway.
> > 
> 
> Android doesn't use a standard udev so we'd need something that
> would work there. My understanding was that android needs everything
> specified at boot unless that's changed.
> 
> There would be enough information to assign stable names
> (e.g. /dev/ion-system) if we used the query ioctl to find out
> what's actually available. Is just the ioctl useful though?

Wouldn't this new ioctl() to query the heap name also result in special case
handling of all ion devices in user space?

If the devices are named with their corresponding heap names like ion-system, 
ion-cma etc.
It is entirely possible and easy in android/ueventd to create those nodes
under "/dev/ion/".  (assuming the heap 'subsystem' for these new devices will
point to 'ion').

Something like the following should work if added in ueventd.rc

  subsystem ion
devname uevent_devname
dirname /dev/ion

Also, makes SElinux labelling easier.

(Also FWIW, the SELinux permissions are also possible with the current ion
implementation by adding rules to disallow specific ioctls instead of adding
permissions to access device node as this change would do)


- ssp

> 
> Thanks,
> Laura
> ___
> devel mailing list
> de...@linuxdriverproject.org
> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel