Re: [PATCH] sched/fair: provide u64 read for 32-bits arch helper

2020-07-28 Thread Lukasz Luba

Hi Vincent,

On 7/27/20 11:59 AM, vincent.donnef...@arm.com wrote:

From: Vincent Donnefort 

Introducing two macro helpers u64_32read() and u64_32read_set_copy() to
factorize the u64 vminruntime and last_update_time read on a 32-bits
architecture. Those new helpers encapsulate smp_rmb() and smp_wmb()
synchronization and therefore, have a small penalty in set_task_rq_fair()
and init_cfs_rq().

The choice of using a macro over an inline function is driven by the
conditional u64 variable copy declarations.

   #ifndef CONFIG_64BIT
  u64 [vminruntime|last_update_time]_copy;
   #endif

Signed-off-by: Vincent Donnefort 



I've run it on 32-bit ARM big.LITTLE platform (odroid xu3) with
the CONFIG_FAIR_GROUP_SCHED and CONFIG_CGROUP_SCHED.

I haven't observed any issues while running hackbench or booting
the kernel, the performance of affected function
set_task_rq_fair() is the same.

Function profile results after hackbench test:
w/ Vincent's patch + performance cpufreq gov
root@odroid:/sys/kernel/debug/tracing# cat trace_stat/function?
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  40683753.185 us 0.922 
us1.239 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  44924180.133 us 0.930 
us2.318 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  42083991.386 us 0.948 
us13.552 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  47534432.231 us 0.932 
us5.875 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  79805037.096 us 0.631 
us1.690 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  81435078.515 us 0.623 
us2.930 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  97216477.904 us 0.666 
us2.425 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  77434896.002 us 0.632 
us1.188 us



---w/o Vincent patch, performance cpufreq gov
root@odroid:/sys/kernel/debug/tracing# cat trace_stat/function?
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  65255830.450 us 0.893 
us3.213 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  66466069.444 us 0.913 
us9.651 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  59885646.133 us 0.942 
us7.685 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  58835390.103 us 0.916 
us29.283 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  38442561.186 us 0.666 
us0.933 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  55153491.011 us 0.633 
us9.845 us
  Function   HitTimeAvg 
s^2
     ------ 
---
  set_task_rq_fair  69474808.522 us 0.692 
us5.822 us
  Function  

Re: [PATCH] sched/fair: provide u64 read for 32-bits arch helper

2020-07-28 Thread peterz
On Mon, Jul 27, 2020 at 04:23:03PM +0100, Vincent Donnefort wrote:

> For 32-bit architectures, both min_vruntime and last_update_time are using
> similar access. This patch is simply an attempt to unify their usage by
> introducing two macros to rely on when accessing those. At the same time, it
> brings a comment regarding the barriers usage, as per the kernel policy. So
> overall this is just a clean-up without any functional changes.

Ah, I though there was perhaps the idea to make use of armv7-lpae
instructions.

Aside of that, I think we need to spend a little time bike-shedding the
API/naming here:

> +# define u64_32read(val, copy) (val)
> +# define u64_32read_set_copy(val, copy) do { } while (0)

How about something like:

#ifdef CONFIG_64BIT

#define DEFINE_U64_U32(name)u64 name
#define u64_u32_load(name)  name
#define u64_u32_store(name, val)name = val

#else

#define DEFINE_U64_U32(name)\
struct {\
u64 name;   \
u64 name##_copy;\
}

#define u64_u32_load(name)  \
({  \
u64 val, copy;  \
do {\
val = name; \
smb_rmb();  \
copy = name##_copy; \
} while (val != copy);  \
val;
})

#define u64_u32_store(name, val)\
do {\
typeof(val) __val = (val);  \
name = __val;   \
smp_wmb();  \
name##_copy = __val;\
} while (0)

#endif




Re: [PATCH] sched/fair: provide u64 read for 32-bits arch helper

2020-07-28 Thread peterz
On Tue, Jul 28, 2020 at 01:13:02PM +0200, pet...@infradead.org wrote:
> On Mon, Jul 27, 2020 at 04:23:03PM +0100, Vincent Donnefort wrote:
> 
> > For 32-bit architectures, both min_vruntime and last_update_time are using
> > similar access. This patch is simply an attempt to unify their usage by
> > introducing two macros to rely on when accessing those. At the same time, it
> > brings a comment regarding the barriers usage, as per the kernel policy. So
> > overall this is just a clean-up without any functional changes.
> 
> Ah, I though there was perhaps the idea to make use of armv7-lpae
> instructions.
> 
> Aside of that, I think we need to spend a little time bike-shedding the
> API/naming here:
> 
> > +# define u64_32read(val, copy) (val)
> > +# define u64_32read_set_copy(val, copy) do { } while (0)
> 
> How about something like:
> 
> #ifdef CONFIG_64BIT
> 
> #define DEFINE_U64_U32(name)  u64 name
> #define u64_u32_load(name)name
> #define u64_u32_store(name, val)name = val
> 
> #else
> 
> #define DEFINE_U64_U32(name)  \
>   struct {\
>   u64 name;   \
>   u64 name##_copy;\
>   }
> 
> #define u64_u32_load(name)\
>   ({  \
>   u64 val, copy;  \
>   do {\
>   val = name; \
>   smb_rmb();  \
>   copy = name##_copy; \
>   } while (val != copy);  \

wrong order there; we should first read _copy and then the regular one
of course.

>   val;
>   })
> 
> #define u64_u32_store(name, val)  \
>   do {\
>   typeof(val) __val = (val);  \
>   name = __val;   \
>   smp_wmb();  \
>   name##_copy = __val;\
>   } while (0)
> 
> #endif

The other approach is making it a full type and inline functions I
suppose.


Re: [PATCH] sched/fair: provide u64 read for 32-bits arch helper

2020-07-28 Thread Vincent Donnefort
Hi,

On Tue, Jul 28, 2020 at 02:00:27PM +0200, pet...@infradead.org wrote:
> On Tue, Jul 28, 2020 at 01:13:02PM +0200, pet...@infradead.org wrote:
> > On Mon, Jul 27, 2020 at 04:23:03PM +0100, Vincent Donnefort wrote:
> > 
> > > For 32-bit architectures, both min_vruntime and last_update_time are using
> > > similar access. This patch is simply an attempt to unify their usage by
> > > introducing two macros to rely on when accessing those. At the same time, 
> > > it
> > > brings a comment regarding the barriers usage, as per the kernel policy. 
> > > So
> > > overall this is just a clean-up without any functional changes.
> > 
> > Ah, I though there was perhaps the idea to make use of armv7-lpae
> > instructions.
> > 
> > Aside of that, I think we need to spend a little time bike-shedding the
> > API/naming here:
> > 
> > > +# define u64_32read(val, copy) (val)
> > > +# define u64_32read_set_copy(val, copy) do { } while (0)
> > 
> > How about something like:
> > 
> > #ifdef CONFIG_64BIT
> > 
> > #define DEFINE_U64_U32(name)u64 name
> > #define u64_u32_load(name)  name
> > #define u64_u32_store(name, val)name = val
> > 
> > #else
> > 
> > #define DEFINE_U64_U32(name)\
> > struct {\
> > u64 name;   \
> > u64 name##_copy;\
> > }
> > 
> > #define u64_u32_load(name)  \
> > ({  \
> > u64 val, copy;  \
> > do {\
> > val = name; \
> > smb_rmb();  \
> > copy = name##_copy; \
> > } while (val != copy);  \
> 
> wrong order there; we should first read _copy and then the regular one
> of course.
> 
> > val;
> > })
> > 
> > #define u64_u32_store(name, val)\
> > do {\
> > typeof(val) __val = (val);  \
> > name = __val;   \
> > smp_wmb();  \
> > name##_copy = __val;\
> > } while (0)
> > 
> > #endif
> 
> The other approach is making it a full type and inline functions I
> suppose.

I didn't pick this approach originally, as I thought it would be way too
invasive. If the API looks way cleaner, it nonetheless doesn't match
last_update_time usage. The variable is declared in sched_avg while its copy
is in struct cfs_rq.

Moving the copy in sched_avg would mean:

 * Setting the copy for all struct sched_avg in ___update_load_sum(), instead
   of just the cfs_rq.avg in update_cfs_rq_load_avg().

 * Having the DEFINE_U64_U32 declaration in include/linux/sched.h to cover
   struct sched_avg.

-- 
Vincent


Re: [PATCH] sched/fair: provide u64 read for 32-bits arch helper

2020-08-18 Thread Vincent Donnefort
Hi Peter,

[...]

> > > 
> > > How about something like:
> > > 
> > > #ifdef CONFIG_64BIT
> > > 
> > > #define DEFINE_U64_U32(name)  u64 name
> > > #define u64_u32_load(name)name
> > > #define u64_u32_store(name, val)name = val
> > > 
> > > #else
> > > 
> > > #define DEFINE_U64_U32(name)  \
> > >   struct {\
> > >   u64 name;   \
> > >   u64 name##_copy;\
> > >   }
> > > 
> > > #define u64_u32_load(name)\
> > >   ({  \
> > >   u64 val, copy;  \
> > >   do {\
> > >   val = name; \
> > >   smb_rmb();  \
> > >   copy = name##_copy; \
> > >   } while (val != copy);  \
> > 
> > wrong order there; we should first read _copy and then the regular one
> > of course.
> > 
> > >   val;
> > >   })
> > > 
> > > #define u64_u32_store(name, val)  \
> > >   do {\
> > >   typeof(val) __val = (val);  \
> > >   name = __val;   \
> > >   smp_wmb();  \
> > >   name##_copy = __val;\
> > >   } while (0)
> > > 
> > > #endif
> > 
> > The other approach is making it a full type and inline functions I
> > suppose.
> 
> I didn't pick this approach originally, as I thought it would be way too
> invasive. If the API looks way cleaner, it nonetheless doesn't match
> last_update_time usage. The variable is declared in sched_avg while its copy
> is in struct cfs_rq.
> 
> Moving the copy in sched_avg would mean:
> 
>  * Setting the copy for all struct sched_avg in ___update_load_sum(), instead
>of just the cfs_rq.avg in update_cfs_rq_load_avg().
> 
>  * Having the DEFINE_U64_U32 declaration in include/linux/sched.h to cover
>struct sched_avg.

Gentle ping



Re: [PATCH] sched/fair: provide u64 read for 32-bits arch helper

2020-07-27 Thread Ingo Molnar


* vincent.donnef...@arm.com  wrote:

> From: Vincent Donnefort 
> 
> Introducing two macro helpers u64_32read() and u64_32read_set_copy() to
> factorize the u64 vminruntime and last_update_time read on a 32-bits
> architecture. Those new helpers encapsulate smp_rmb() and smp_wmb()
> synchronization and therefore, have a small penalty in set_task_rq_fair()
> and init_cfs_rq().
> 
> The choice of using a macro over an inline function is driven by the
> conditional u64 variable copy declarations.

Could you please explain how "conditional u64 variable copy 
declarations" prevents us to use an inline function for this:

> +/*
> + * u64_32read() / u64_32read_set_copy()
> + *
> + * Use the copied u64 value to protect against data race. This is only
> + * applicable for 32-bits architectures.
> + */
> +#if !defined(CONFIG_64BIT) && defined(CONFIG_SMP)
> +# define u64_32read(val, copy)   
> \
> +({   \
> + u64 _val;   \
> + u64 _val_copy;  \
> + \
> + do {\
> + _val_copy = copy;   \
> + /*  \
> +  * paired with u64_32read_set_copy, ordering access \
> +  * to val and copy. \
> +  */ \
> + smp_rmb();  \
> + _val = val; \
> + } while (_val != _val_copy);\
> + \
> + _val;   \
> +})
> +# define u64_32read_set_copy(val, copy)  
> \
> +do { \
> + /* paired with u64_32read, ordering access to val and copy */   \
> + smp_wmb();  \
> + copy = val; \
> +} while (0)
> +#else
> +# define u64_32read(val, copy) (val)
> +# define u64_32read_set_copy(val, copy) do { } while (0)
> +#endif
> +

Thanks,

Ingo


Re: [PATCH] sched/fair: provide u64 read for 32-bits arch helper

2020-07-27 Thread Vincent Donnefort
Hi,

On Mon, Jul 27, 2020 at 01:24:54PM +0200, Ingo Molnar wrote:
> 
> * vincent.donnef...@arm.com  wrote:
> 
> > From: Vincent Donnefort 
> > 
> > Introducing two macro helpers u64_32read() and u64_32read_set_copy() to
> > factorize the u64 vminruntime and last_update_time read on a 32-bits
> > architecture. Those new helpers encapsulate smp_rmb() and smp_wmb()
> > synchronization and therefore, have a small penalty in set_task_rq_fair()
> > and init_cfs_rq().
> > 
> > The choice of using a macro over an inline function is driven by the
> > conditional u64 variable copy declarations.
> 
> Could you please explain how "conditional u64 variable copy 
> declarations" prevents us to use an inline function for this

I meant, as the "copy" variable [vminruntime|last_update_time]_copy, is
declared only in the !CONFIG_64BIT case, a function call would fail when
CONFIG_64BIT is set.

   u64_32read(cfs_rq->min_vruntime, cfs_rq->min_vruntime_copy); 
^
  not declared

I could rephrase this part to something more understandable ?

> 
> > +/*
> > + * u64_32read() / u64_32read_set_copy()
> > + *
> > + * Use the copied u64 value to protect against data race. This is only
> > + * applicable for 32-bits architectures.
> > + */
> > +#if !defined(CONFIG_64BIT) && defined(CONFIG_SMP)
> > +# define u64_32read(val, copy) 
> > \
> > +({ \
> > +   u64 _val;   \
> > +   u64 _val_copy;  \
> > +   \
> > +   do {\
> > +   _val_copy = copy;   \
> > +   /*  \
> > +* paired with u64_32read_set_copy, ordering access \
> > +* to val and copy. \
> > +*/ \
> > +   smp_rmb();  \
> > +   _val = val; \
> > +   } while (_val != _val_copy);\
> > +   \
> > +   _val;   \
> > +})
> > +# define u64_32read_set_copy(val, copy)
> > \
> > +do {   
> > \
> > +   /* paired with u64_32read, ordering access to val and copy */   \
> > +   smp_wmb();  \
> > +   copy = val; \
> > +} while (0)
> > +#else
> > +# define u64_32read(val, copy) (val)
> > +# define u64_32read_set_copy(val, copy) do { } while (0)
> > +#endif
> > +
> 
> Thanks,
> 
>   Ingo

-- 
Vincent


Re: [PATCH] sched/fair: provide u64 read for 32-bits arch helper

2020-07-27 Thread peterz
On Mon, Jul 27, 2020 at 11:59:24AM +0100, vincent.donnef...@arm.com wrote:
> From: Vincent Donnefort 
> 
> Introducing two macro helpers u64_32read() and u64_32read_set_copy() to
> factorize the u64 vminruntime and last_update_time read on a 32-bits
> architecture. Those new helpers encapsulate smp_rmb() and smp_wmb()
> synchronization and therefore, have a small penalty in set_task_rq_fair()
> and init_cfs_rq().
> 
> The choice of using a macro over an inline function is driven by the
> conditional u64 variable copy declarations.
> 
>   #ifndef CONFIG_64BIT
>  u64 [vminruntime|last_update_time]_copy;
>   #endif

This lacks a *why*... why did you get up this morning and wrote us this
patch.




Re: [PATCH] sched/fair: provide u64 read for 32-bits arch helper

2020-07-27 Thread Vincent Donnefort
On Mon, Jul 27, 2020 at 02:38:01PM +0200, pet...@infradead.org wrote:
> On Mon, Jul 27, 2020 at 11:59:24AM +0100, vincent.donnef...@arm.com wrote:
> > From: Vincent Donnefort 
> > 
> > Introducing two macro helpers u64_32read() and u64_32read_set_copy() to
> > factorize the u64 vminruntime and last_update_time read on a 32-bits
> > architecture. Those new helpers encapsulate smp_rmb() and smp_wmb()
> > synchronization and therefore, have a small penalty in set_task_rq_fair()
> > and init_cfs_rq().
> > 
> > The choice of using a macro over an inline function is driven by the
> > conditional u64 variable copy declarations.
> > 
> >   #ifndef CONFIG_64BIT
> >  u64 [vminruntime|last_update_time]_copy;
> >   #endif
> 
> This lacks a *why*... why did you get up this morning and wrote us this
> patch.
> 
>

For 32-bit architectures, both min_vruntime and last_update_time are using
similar access. This patch is simply an attempt to unify their usage by
introducing two macros to rely on when accessing those. At the same time, it
brings a comment regarding the barriers usage, as per the kernel policy. So
overall this is just a clean-up without any functional changes.

-- 
Vincent.