Re: inlining glib functions (Was: public barrier functions)

2005-12-14 Thread Balazs Scheidler
On Tue, 2005-12-13 at 22:13 +0100, Tim Janik wrote:
 On Tue, 13 Dec 2005, Gustavo J. A. M. Carneiro wrote:
 
  Ter, 2005-12-13 às 17:11 +0100, Tim Janik escreveu:
 
   IMHO, some functions are obvious candidates for inlining, regardless
  of any profiling done on them.  For instance:
 
  gchar*
  g_strdup (const gchar *str)
  {
   gchar *new_str;
   gsize length;
 
   if (str)
 {
   length = strlen (str) + 1;
   new_str = g_new (char, length);
   memcpy (new_str, str, length);
 }
   else
 new_str = NULL;
 
   return new_str;
  }
 
  This function is trivial.  I doubt you'll ever find any new bugs in it.
  It is called in many places.  So why pay a performance penalty when you
  could easily avoid it?
 
 inlining doesn't automatically mean performance improvements and
 not inlining doesn't automatically cause performance penalties.
 
 if you start to inline lots of widely used small functions in
 non performance critical code sections, all you've gained is a
 bigger code section size and less likelyness for warm instruction
 caches (that becomes especially critical when starting to bloat
 tight loops due to inlining).
 now consider that 90% of a programs runtime is spent in 10% of its
 code, that means 90% of your inlininig does ocoour in non performance
 critical sections.
 that's why modern compilers use tunable heuristics to decide about
 automated inlining and don't stupidly inline everything they can.

I personally would not say to inline every trivial function in GLib, I
was talking about single-instruction functions that are not inlined
right now, it might even be possible that the call instruction itself is
longer than the instruction itself, not to mention that functions linked
in from shared libraries jump twice to reach the actual body of the
function. (first call to a stub which jumps to the function itself), so
it effectively empties the instruction pipeline twice.

Nevertheless I shut up and post patches :) Thanks for the information
and sorry for the noise.

-- 
Bazsi

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: public barrier functions

2005-12-13 Thread Sebastian Wilhelmi
Hi Paul,

 But I'd be interested to see some benchmarks; see how much this
 actually matters. Run a typical program twice; once with functions and
 once with some inlines/macros. It's quite likely that in a real-world
 program, the ratio of time it actually spends doing the atomic operation
 function calls, to the amount of CPU time in general, will actually be
 rather small indeed. Such an optimisation is likely to be of little
 actual benefit, for the cost it brings.

We don't have numbers of typical programs, but we have benchmarks. Look
at

http://bugzilla.gnome.org/show_bug.cgi?id=63621

The numbers for i386 are:

inline   : 4.376484 sec
function : 7.325000 sec
fallback : 23.984717 sec

for ppc64:

inline   : 1.961480 sec
function : 3.328593 sec
fallback : 31.004492 sec

Regards,
Sebastian
-- 
Sebastian Wilhelmi |här ovanför alla molnen
mailto:[EMAIL PROTECTED]  |  är himlen så förunderligt blå
http://seppi.de|


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: public barrier functions

2005-12-13 Thread Balazs Scheidler
On Mon, 2005-12-12 at 22:41 +0100, Sebastian Wilhelmi wrote:
 Hi Balazs,
 
  Is there a specific reasons why the barrier functions implemented by
  gatomic.c and gatomic.h are not exported APIs?
 
 We didn't want to create the Swiss army knife for high performance
 multithread programming, just atomic integers. As you are surely aware,
 using memory barriers is far from an easy topic and bugs are easily
 introduced.

Sure. I only wanted to avoid using locks on the fastpath of my
application, but I already solved it with atomic integers and I've read
a lot of articles on memory barriers in the meanwhile.


  And while I am at it, would it be possible to change the atomic
  operations to inline functions? I'd think it is much better inline
  single-instruction functions as otherwise the call overhead is too
  great.
 
 That would make it impossible to fix the corresponding implementations
 also for already compiled programs, should bugs surface (which they
 already did) and it would also make it impossible to guarantee, that all
 programs really use the same implementation, i.e. with inline functions
 one module could use the asm version (because gcc is used) and the
 second module would use the mutex versions (because another compiler is
 used). That would be very bad of course.

That's a valid point but maybe it would be possible to request inlined
implementations of some functions by the use of a preprocessor symbol,
e.g.

#define G_GLIB_USE_INLINE_FUNCS
#include glib.h

Of course this would only be possible if it is not a maintenance
head-ache, e.g. the copied function bodies should not be copied, but
autogenerated instead.

Would something like this be accepted?

-- 
Bazsi

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: inlining glib functions (Was: public barrier functions)

2005-12-13 Thread muppet

Gustavo J. A. M. Carneiro said:
   IMHO, some functions are obvious candidates for inlining, regardless
 of any profiling done on them.  For instance:

 gchar*
 g_strdup (const gchar *str)
 {
   gchar *new_str;
   gsize length;

   if (str)
 {
   length = strlen (str) + 1;
   new_str = g_new (char, length);
   memcpy (new_str, str, length);
 }
   else
 new_str = NULL;

   return new_str;
 }

 This function is trivial.  I doubt you'll ever find any new bugs in it.
 It is called in many places.  So why pay a performance penalty when you
 could easily avoid it?  Glib has many such small functions.

g_strdup() is a *very* poor example of a function that could be inlined for
performance.  The function is trivial, yes, but it calls strlen(), malloc(),
and memcpy() --- three operations which are going to swamp the time spent in
making a call to a real symbol for g_strdup().  This would be a misguided
optimization, throwing away the ability to fix bugs behind the scenes for a
negligible speed improvement.


-- 
muppet scott at asofyet dot org

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: inlining glib functions (Was: public barrier functions)

2005-12-13 Thread Tim Janik

On Tue, 13 Dec 2005, Gustavo J. A. M. Carneiro wrote:


Ter, 2005-12-13 às 17:11 +0100, Tim Janik escreveu:



more important than _how_ to inline is _what_ and _why_ to inline.
in general, things that can easily and reasonably be inlined have been
already been provided as inlined functions or macros in the glib headers.
so for functions that are not inlined but you think _should_ be inlined,
a persuasive argument should be given, e.g. a profiling scenario where the
function in question shows up with significant figures and significant
timing improvements for using the inlined version.
the g_atomic_* functions are a good example of this (see profiling figures
mentioned in the original thread), but they are still not inlined for other
reasons.


 IMHO, some functions are obvious candidates for inlining, regardless
of any profiling done on them.  For instance:

gchar*
g_strdup (const gchar *str)
{
 gchar *new_str;
 gsize length;

 if (str)
   {
 length = strlen (str) + 1;
 new_str = g_new (char, length);
 memcpy (new_str, str, length);
   }
 else
   new_str = NULL;

 return new_str;
}

This function is trivial.  I doubt you'll ever find any new bugs in it.
It is called in many places.  So why pay a performance penalty when you
could easily avoid it?


inlining doesn't automatically mean performance improvements and
not inlining doesn't automatically cause performance penalties.

if you start to inline lots of widely used small functions in
non performance critical code sections, all you've gained is a
bigger code section size and less likelyness for warm instruction
caches (that becomes especially critical when starting to bloat
tight loops due to inlining).
now consider that 90% of a programs runtime is spent in 10% of its
code, that means 90% of your inlininig does ocoour in non performance
critical sections.
that's why modern compilers use tunable heuristics to decide about
automated inlining and don't stupidly inline everything they can.

what you're suggesting is blind optimization, experienced programmers
will tell you that this will result in more harm than good. profiling
a critical section, and maybe inlining/optimizing a single string
copy in a critical loop can gain you ten- or hundredfold the improvements
that you could get from some shallow global optimization.

you don't even need to believe me, just start googling for
premature and optimization and read, there's enough stuff
out there to make your christmas holidays ;)



 Glib has many such small functions.

[ BTW, if (str) could be changed to if (G_LIKELY(str)) ]


yes it could, and it would make sense. the best thing you can do to
make sure such improvements are integrated, is to submit complete
patches for such changes (including changelog entries) so we only
need to apply, compile and test them and are done.
(and if you have/need commit access, after a number of quality
submissions that is usually granted because it can save us
additional work.)



 One other thing; it is well known that inline functions are better
than macros:
- Give you better type safety;
- Less cryptic warnings/errors when calling them with wrong types
- For debugging, you can still disable inlining through the CFLAGS in
order to step into the inline functions in step by step debugging;

 So why not start using less macros and more inline functions?


well, we're not completely unaware of the type safety inline functions
can offer over some macros uses ;)

why don't you start to submit patches for macros where you think we
really should have used an inlined function and we discuss specific
cases then?

you can take a look at the existing glib headers to see how we do inlined
functions, and read the comments which describe how inlined functions
still are built in a non-inlined version into glib.


--
Gustavo J. A. M. Carneiro


---
ciaoTJ___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: inlining glib functions (Was: public barrier functions)

2005-12-13 Thread Gustavo J. A. M. Carneiro
On Tue, 2005-12-13 at 15:40 -0500, muppet wrote:
 Gustavo J. A. M. Carneiro said:
IMHO, some functions are obvious candidates for inlining, regardless
  of any profiling done on them.  For instance:
 
  gchar*
  g_strdup (const gchar *str)
  {
gchar *new_str;
gsize length;
 
if (str)
  {
length = strlen (str) + 1;
new_str = g_new (char, length);
memcpy (new_str, str, length);
  }
else
  new_str = NULL;
 
return new_str;
  }
 
  This function is trivial.  I doubt you'll ever find any new bugs in it.
  It is called in many places.  So why pay a performance penalty when you
  could easily avoid it?  Glib has many such small functions.
 
 g_strdup() is a *very* poor example of a function that could be inlined for
 performance.  The function is trivial, yes, but it calls strlen(), malloc(),
 and memcpy() --- three operations which are going to swamp the time spent in
 making a call to a real symbol for g_strdup().  This would be a misguided
 optimization,

  OK, I just made one mistake: I forgot that strlen and memcpy are
already inlined, so that function in particular turned out not to be so
small :P

  Some examples of really tiny functions are g_list_find, g_list_length,
g_ascii_dtostr.  About half a dozen assembly instructions (on x86) each.

  throwing away the ability to fix bugs behind the scenes

  I meant this only for functions that are trivial; do you think there's
any chance for anyone ever spot a bug in g_strdup?

  Regards.

-- 
Gustavo J. A. M. Carneiro
[EMAIL PROTECTED] [EMAIL PROTECTED]
The universe is always one step beyond logic

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


public barrier functions

2005-12-12 Thread Balazs Scheidler
Hi,

Is there a specific reasons why the barrier functions implemented by
gatomic.c and gatomic.h are not exported APIs?

I'd like to avoid locking in some situations where these memory barrier
instructions would come handy.

One thread:
  ptr = NULL

Other thread:
  void *my_ptr = ptr;

  if (ptr)
{
}

Of course this would be a race condition if I was trying to use the
pointer, but if I add reference counting like this:

One thread:
  loc_ptr = ptr;
  ptr = NULL;
  g_data_unref(loc_ptr);

Other thread:
  void *my_ptr = g_data_ref(ptr);

  if (my_ptr)
{
}

Unless I miss something this should work, provided:
  1) provided g_data_ref handles NULL pointers
  2) the reference count of ptr itself is manipilated with atomic
operations.
  3) the CPU ensures proper read/write memory ordering

Now this is not true on some non-x86 CPUs in which case I'd need
something like:

One thread:
  loc_ptr = ptr;
  ptr = NULL;
  wmb();
  g_data_unref(loc_ptr);

Other thread:
  void *my_ptr = g_data_ref(ptr);

  if (my_ptr)
{
}

Now the question is why the memory barrier macros are hidden in the
gatomic module and not exported.

And while I am at it, would it be possible to change the atomic
operations to inline functions? I'd think it is much better inline
single-instruction functions as otherwise the call overhead is too
great.

Thanks in advance,

-- 
Bazsi

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: inlining glib functions (Was: public barrier functions)

2005-12-12 Thread Balazs Scheidler
On Mon, 2005-12-12 at 18:44 +, Gustavo J. A. M. Carneiro wrote:
 Seg, 2005-12-12 às 19:29 +0100, Balazs Scheidler escreveu:
 [...]
  
  And while I am at it, would it be possible to change the atomic
  operations to inline functions? I'd think it is much better inline
  single-instruction functions as otherwise the call overhead is too
  great.
 
   I agree.  Also many other glib functions could be static inline in the
 public header files.  For instance, many of the functions in glist.c and
 gslist.c are really tiny, thus could easily be inlined, but aren't
 because the compiler has no access to their implementation, only to
 their prototype.

One problem I see with this is binary compatibility. The shared lib
version of glib has to provide the old non-inlined symbols, and simply
moving the functions to the header as static inline would remove those
symbols, even though I would not be surprised if this could be worked
around with some gcc trickery, something along the lines of:

gatomic.h:

static inline void
g_atomic_int_inc(gint *value)
{
  ...
}

ginlineimpls.c (probably auto-generated in some way):

#define g_atomic_int_inc __inline_g_atomic_int_inc
#include gatomic.h
#undef g_atomic_int_inc

void
g_atomic_int_inc(gint *value)
{
  __inline_g_atomic_int_inc(value);
}

Other opinions?

-- 
Bazsi

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: public barrier functions

2005-12-12 Thread Paul LeoNerd Evans
On Mon, Dec 12, 2005 at 10:41:55PM +0100, Sebastian Wilhelmi wrote:
  And while I am at it, would it be possible to change the atomic
  operations to inline functions? I'd think it is much better inline
  single-instruction functions as otherwise the call overhead is too
  great.
 
 That would make it impossible to fix the corresponding implementations
 also for already compiled programs, should bugs surface (which they
 already did) and it would also make it impossible to guarantee, that all
 programs really use the same implementation, i.e. with inline functions
 one module could use the asm version (because gcc is used) and the
 second module would use the mutex versions (because another compiler is
 used). That would be very bad of course.

Yes, I'd agree here. I think it's more important to have a consistent
library of such operations across all programs, an an easy way to apply
bugfixes, than it is to have down-to-the-cycle optimisation.

But I'd be interested to see some benchmarks; see how much this
actually matters. Run a typical program twice; once with functions and
once with some inlines/macros. It's quite likely that in a real-world
program, the ratio of time it actually spends doing the atomic operation
function calls, to the amount of CPU time in general, will actually be
rather small indeed. Such an optimisation is likely to be of little
actual benefit, for the cost it brings.

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list