Re: [RFC PATCH v2 0/1] improve vmap allocation

2019-04-02 Thread Uladzislau Rezki
On Mon, Apr 01, 2019 at 03:59:39PM -0700, Andrew Morton wrote:
> On Mon, 1 Apr 2019 13:03:47 +0200 Uladzislau Rezki  wrote:
> 
> > Hello, Andrew.
> > 
> > >
> > > It's a lot of new code. I t looks decent and I'll toss it in there for
> > > further testing.  Hopefully someone will be able to find the time for a
> > > detailed review.
> > > 
> > I have got some proposals and comments about simplifying the code a bit.
> > So i am about to upload the v3 for further review. I see that your commit
> > message includes whole details and explanation:
> > 
> > http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=b6ac5ca4c512094f217a8140edeea17e6621b2ad
> > 
> > Should i base on and keep it in v3?
> > 
> 
> It's probably best to do a clean resend at this stage.  I'll take a
> look at the deltas and updating changelogs, etc.
Will resend then.

Thank you.

--
Vlad Rezki


Re: [RFC PATCH v2 0/1] improve vmap allocation

2019-04-01 Thread Andrew Morton
On Mon, 1 Apr 2019 13:03:47 +0200 Uladzislau Rezki  wrote:

> Hello, Andrew.
> 
> >
> > It's a lot of new code. I t looks decent and I'll toss it in there for
> > further testing.  Hopefully someone will be able to find the time for a
> > detailed review.
> > 
> I have got some proposals and comments about simplifying the code a bit.
> So i am about to upload the v3 for further review. I see that your commit
> message includes whole details and explanation:
> 
> http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=b6ac5ca4c512094f217a8140edeea17e6621b2ad
> 
> Should i base on and keep it in v3?
> 

It's probably best to do a clean resend at this stage.  I'll take a
look at the deltas and updating changelogs, etc.


Re: [RFC PATCH v2 0/1] improve vmap allocation

2019-04-01 Thread Uladzislau Rezki
Hello, Andrew.

>
> It's a lot of new code. I t looks decent and I'll toss it in there for
> further testing.  Hopefully someone will be able to find the time for a
> detailed review.
> 
I have got some proposals and comments about simplifying the code a bit.
So i am about to upload the v3 for further review. I see that your commit
message includes whole details and explanation:

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=b6ac5ca4c512094f217a8140edeea17e6621b2ad

Should i base on and keep it in v3?

Thank you in advance!

--
Vlad Rezki


Re: [RFC PATCH v2 0/1] improve vmap allocation

2019-03-22 Thread Joel Fernandes
On Fri, Mar 22, 2019 at 05:52:59PM +0100, Uladzislau Rezki wrote:
> On Thu, Mar 21, 2019 at 03:01:06PM -0700, Andrew Morton wrote:
> > On Thu, 21 Mar 2019 20:03:26 +0100 "Uladzislau Rezki (Sony)" 
> >  wrote:
> > 
> > > Hello.
> > > 
> > > This is the v2 of the https://lkml.org/lkml/2018/10/19/786 rework. 
> > > Instead of
> > > referring you to that link, i will go through it again describing the 
> > > improved
> > > allocation method and provide changes between v1 and v2 in the end.
> > > 
> > > ...
> > >
> > 
> > > Performance analysis
> > > 
> > 
> > Impressive numbers.  But this is presumably a worst-case microbenchmark.
> > 
> > Are you able to describe the benefits which are observed in some
> > real-world workload which someone cares about?
> > 
> We work with Android. Google uses its own tool called UiBench to measure
> performance of UI. It counts dropped or delayed frames, or as they call it,
> jank. Basically if we deliver 59(should be 60) frames per second then we
> get 1 junk/drop.

Agreed. Strictly speaking, "1 Jank" is not necessarily "1 frame drop". A
delayed frame is also a Jank. Just because a frame is delayed does not mean
it is dropped, there is double buffering etc to absorb delays.

> I see that on our devices avg-jank is lower. In our case Android graphics
> pipeline uses vmalloc allocations which can lead to delays of UI content
> to GPU. But such behavior depends on your platform, parts of the system
> which make use of it and if they are critical to time or not.
> 
> Second example is indirect impact. During analysis of audio glitches
> in high-resolution audio the source of drops were long alloc_vmap_area()
> allocations.
> 
> # Explanation is here
> ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
> 
> # Audio 10 seconds sample is here.
> # The drop occurs at 00:09.295 you can hear it
> ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav

Nice.

> > It's a lot of new code. I t looks decent and I'll toss it in there for
> > further testing.  Hopefully someone will be able to find the time for a
> > detailed review.
> > 
> Thank you :)

I can try to do a review fwiw. But I am severely buried right now. I did look
at vmalloc code before for similar reasons (preempt off related delays
causing jank / glitches etc). Any case, I'll take another look soon (in next
1-2 weeks).

> > Trivial point: the code uses "inline" a lot.  Nowadays gcc cheerfully
> > ignores that and does its own thing.  You might want to look at the
> > effects of simply deleting all that.  Is the generated code better or
> > worse or the same?  If something really needs to be inlined then use
> > __always_inline, preferably with a comment explaining why it is there.
> > 
> When the main core functionalities are "inlined" i see the benefit. 
> At least, it is noticeable by the "test driver". But i agree that
> i should check one more time to see what can be excluded and used
> as a regular call. Thanks for the hint, it is worth to go with
> __always_inline instead.

I wonder how clang behaves as far as inline hints go. That is how Android
images build their kernels.

thanks,

 - Joel



Re: [RFC PATCH v2 0/1] improve vmap allocation

2019-03-22 Thread Uladzislau Rezki
On Thu, Mar 21, 2019 at 03:01:06PM -0700, Andrew Morton wrote:
> On Thu, 21 Mar 2019 20:03:26 +0100 "Uladzislau Rezki (Sony)" 
>  wrote:
> 
> > Hello.
> > 
> > This is the v2 of the https://lkml.org/lkml/2018/10/19/786 rework. Instead 
> > of
> > referring you to that link, i will go through it again describing the 
> > improved
> > allocation method and provide changes between v1 and v2 in the end.
> > 
> > ...
> >
> 
> > Performance analysis
> > 
> 
> Impressive numbers.  But this is presumably a worst-case microbenchmark.
> 
> Are you able to describe the benefits which are observed in some
> real-world workload which someone cares about?
> 
We work with Android. Google uses its own tool called UiBench to measure
performance of UI. It counts dropped or delayed frames, or as they call it,
jank. Basically if we deliver 59(should be 60) frames per second then we
get 1 junk/drop.

I see that on our devices avg-jank is lower. In our case Android graphics
pipeline uses vmalloc allocations which can lead to delays of UI content
to GPU. But such behavior depends on your platform, parts of the system
which make use of it and if they are critical to time or not.

Second example is indirect impact. During analysis of audio glitches
in high-resolution audio the source of drops were long alloc_vmap_area()
allocations.

# Explanation is here
ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt

# Audio 10 seconds sample is here.
# The drop occurs at 00:09.295 you can hear it
ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav

>
> It's a lot of new code. I t looks decent and I'll toss it in there for
> further testing.  Hopefully someone will be able to find the time for a
> detailed review.
> 
Thank you :)

> Trivial point: the code uses "inline" a lot.  Nowadays gcc cheerfully
> ignores that and does its own thing.  You might want to look at the
> effects of simply deleting all that.  Is the generated code better or
> worse or the same?  If something really needs to be inlined then use
> __always_inline, preferably with a comment explaining why it is there.
> 
When the main core functionalities are "inlined" i see the benefit. 
At least, it is noticeable by the "test driver". But i agree that
i should check one more time to see what can be excluded and used
as a regular call. Thanks for the hint, it is worth to go with
__always_inline instead.

--
Vlad Rezki


Re: [RFC PATCH v2 0/1] improve vmap allocation

2019-03-21 Thread Andrew Morton
On Thu, 21 Mar 2019 20:03:26 +0100 "Uladzislau Rezki (Sony)"  
wrote:

> Hello.
> 
> This is the v2 of the https://lkml.org/lkml/2018/10/19/786 rework. Instead of
> referring you to that link, i will go through it again describing the improved
> allocation method and provide changes between v1 and v2 in the end.
> 
> ...
>

> Performance analysis
> 

Impressive numbers.  But this is presumably a worst-case microbenchmark.

Are you able to describe the benefits which are observed in some
real-world workload which someone cares about?

It's a lot of new code. I t looks decent and I'll toss it in there for
further testing.  Hopefully someone will be able to find the time for a
detailed review.

Trivial point: the code uses "inline" a lot.  Nowadays gcc cheerfully
ignores that and does its own thing.  You might want to look at the
effects of simply deleting all that.  Is the generated code better or
worse or the same?  If something really needs to be inlined then use
__always_inline, preferably with a comment explaining why it is there.