Re: Gtk+ unit tests (brainstorming)

2006-11-16 Thread Tim Janik
On Thu, 26 Oct 2006, Tristan Van Berkom wrote:

 Tim Janik wrote:

 (sometime one property has no
 meaning if another one hasnt been setup yet - in which case a
 g_return_if_fail() guard would be appropriate).


 wrong, some proeprty values are intentionally set up to support
 free-order settings. e.g. foo=5; foo-set=true; or foo-set=true; foo=5;
 every order restriction or property dependency that is introduced in the
 proeprty API makes generic handling of properties harder or sometimes
 impossible so should be avoided at all costs.
 (examples for generic property handling code are language bindings,
 property editors, glade, gtk-doc)

 I think thats a little uncooperative - wouldnt you say on the other hand
 - that
 the independance of property ordering - aside from possible real weird
 corner
 cases - should be promoted as much as possible to make life easier for
 language
 binding writers, property editors and such ?

yes, that is exactly what i was trying to say. sorry if it was hard to
understand.
if at all possible any possible ordering should be supported when
setting properties on an object. at least, no restrictions should be
introduced by an object implementation that can reasonably be worked
around.

for unit tests, we can simply do things like pick properties in random
orders and set them to random values. we can then succesively fix gtk
cases that unreasonably rely on proeprty orders, or add rules to the
test cases about non-fixable ordering requirements.
note that *some* ordering support is already present in the GObject
API by flagging properties as CONSTRUCT or CONSTRUCT_ONLY.

 Ummm, while I'm here - I'd also like to say that - (I'm not about to
 dig up the quote from the original email but) - I think that there
 is some value to reporting all the tests that fail - even after one
 of the tests has failed, based on the same principals that you'd
 want your compiler to tell you everything that went wrong in its
 parse (not just the first error),
[...]
 Sure I'll elaborate, what I mean is - if you dont want to pollute
 the developers console with alot of errors from every failing test,
 you can redirect them to some log file and only say
 at least one test failed - maybe depict which one was the first
 failure - and then point to the log.

 Your arguments against collecting all the failures seem to
 apply fine to gcc output too - and yes you might usually end up just
 fixing the first error and recompiling, yet I manage to still appriciate
 this feature of verbosity - delivered at the not too high cost of
 wording a makefile a little differently (for unit tests that is) - i'd
 think.

ok thanks, i think i get the idea now ;)
i've adressed getting more than a single test failure reported without any
need to modify tests in yesterdays email on this topic:
   http://mail.gnome.org/archives/gtk-devel-list/2006-November/msg00077.html
i'd hope that also covers what you're looking for.

 Cheers,
 -Tristan

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-11-16 Thread Iago Toral Quiroga
El mié, 15-11-2006 a las 10:51 +0100, Iago Toral Quiroga escribió:
   I'll add here some points supporting Check ;):
  
  ok, adressing them one by one, since i see multiple reasons for not
  using Check ;)
  
 [...]
  it's not clear that Check (besides than being an additional
 dependency in
  itself) fullfils all the portability requirements of glib/gtk+ for
 these
  cases though.
  
 [...]
  - Check may be widely used, but is presented as [...] at the moment
 only
 sporadically maintained (http://check.sourceforge.net/).
 that alone causes me to veto any Check dependency for glib/gtk
 already ;)
 
 I've just asked Chris Pickett (Check maintainer) about this issues, so
 he can confirm. I'll forward his opinion in a later mail. 

Here it is:

--

Hi Iago,

Well, I looked at most C unit testing frameworks out there, and ended up
settling on Check.

That I wrote Check is sporadically maintained on the web page means that
it has reached a point where it is fairly stable, and it does the job 
well for people that use it.  You can quote me on that if you want (in 
fact, I'm going to update the web page).  There are lots of users, a 
very low-volume mailing list, and not very many open bugs.  Search 
Google for srunner_create or something and you will see what I mean.

It has some problems with failing its own unit tests at the moment when 
built, but I think it has to do with some hard-coded timeout in the unit
tests and the speed of newer processors.  I haven't actually encountered
any problems using it myself.

As for portability, I don't think there are any serious issues, I 
remember seeing a Windows patch somewhere.  I recently updated it to 
Autoconf 2.50+ and switched the documentation from DocBook to Texinfo.

I know people love to rewrite the world, but in this case, I would 
recommend just using Check for now, and if any problems are encountered,
then write some throwaway scripts to convert the tests to a new format, 
or fix what's actually broken.  It can't be difficult to do either way, 
and I think it would save a lot of time.  Right now, they're proposing a
big speculative design before knowing through experience what their 
needs for GTK+ really are, and experience will help a lot: either they 
will say, Oh, hey, Check is actually great! or they will say, Damn, 
these are all the things that sucked about Check and let's make sure we 
get them right this time!

Probably my best general advice is not to write too many tests, and not 
to go crazy testing assertions, but that has nothing to do with Check. 
You might want to get gcov working at the same time.

You might also do well to send some emails to other projects, e.g. 
GStreamer, and find out what they think of it, whether they would 
recommend it, etc. etc.  I'd be interested to know what they say.  Hmm, 
I just looked at email #3 and I see Stefan from GStreamer recommending 
Check.  Well, I would just listen to him, quite frankly.

Cheers,
Chris

Iago Toral Quiroga wrote:
 Hi Chris,
 
 there is some debate in GTK+ about unit testing. I tried to convince
 people to use Check as framework but it seems they prefer doing
 something from scratch.
 
 The main points against Check that they have provided are:
 
* The web page (check.sourceforge.net) states that Check is
 sporadically maintained.
 
* It's not clear that Check fullfils all the portability
requirements
 of glib/gtk+.
 
* They think the functionality needed to fulfill glib/gtk+ testing
 needs can be implemented ad-hoc with not too much effort, avoiding
 another dependency.
 
 What's your opinion about this issues?
 
 Just in case you want to follow the debate:

http://mail.gnome.org/archives/gtk-devel-list/2006-October/msg00093.html

http://mail.gnome.org/archives/gtk-devel-list/2006-October/msg00129.html

http://mail.gnome.org/archives/gtk-devel-list/2006-October/msg00167.html

http://mail.gnome.org/archives/gtk-devel-list/2006-November/msg00077.html
 
 (Actually this is a bigger debate than just using Check or not, so I
 wrote only some URLs that touch the Check debate at some point).
 
 Cheers,
 Iago.



___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-11-15 Thread Iago Toral Quiroga
El mar, 14-11-2006 a las 15:33 +0100, Tim Janik escribió:
  I understand your point, however I still think that being able to get a
  wider report with all the tests failing at a given moment is also
  interesting (for example in a buildbot continuous integration loop, like
  the one being prepared by the build-brigade). Besides, if there is a
  group of people that want to work on fixing bugs at the same time, they
  would need to get a list of tests failing, not only the first one.
 
 well, you can get that to some extend automatically, if you invoke
$ make -k check

Yes, if we split the tests into independent test programs I think that's
a reasonable approach.

 going beyond that would be a bad trade off i think because:

[...]

 c) implementation and test code often has dependencies that won't allow
 to test beyond the occourance of an error. a simple example is:
   o = object_new();
   ASSERT (o != NULL); /* no point to continue beyond this on error */
   test_function (o);
 a more intimidating case is:
   main() {
 test_gtk_1();
 test_gtk_2();
 test_gtk_3();
 test_gtk_4();
 test_gtk_5();
 // ...
   }
 if any of those test functions (say test_gtk_3) produces a gtk/glib
 error/assertion/warning/critical, the remaining test functions (4, 5, ...)
 are likely to fail for bogus reasons because the libraries entered
 undefined state.
 reports of those subsequent errors (which are likely to be very
 misleading) is useless at best and confusing (in terms of what error 
 really
 matters) at worst.
 yes, forking for each of the test functions works around that (provided
 they are as independent of one another as in the example above), but 
 again,
 this complicates the test implementation (it's not an easy to understand
 test program anymore) and debuggability, i.e. affectes the 2 main
 properties of a good test program.

Mmm... actually, based on my experience using Check and fork mode, it
does not complicate the test implementation beyond adding a gtk_init()
to each forked test. Forking the tests is done transparently by Check
based on an environment variable. The same applies to debugging,
although it is true that debugging a forked test program is anoying, you
can disable fork mode when debugging just switching the environment
variable:

$ CK_FORK=no gdb mytest

This shouldn't be a problem if we stop test execution after the first
failed test.

  Maybe you would like to test how the library handles invalid input. For
  example, let's say we have a function that accepts a pointer as
  parameter, I think it is worth knowing if that function handles safely
  the case when that pointer is NULL (if that is a not allowed value for
  that parameter) or if it produces a segmentation fault in that case.
 
 no, it really doesn't make sense to test functions outside the defined
 value ranges. that's because when implementing, the only thing you need
 to actually care about from an API perspective is: the defined value ranges.
 besides that, value rtanges may compatibly be *extended* in future versions,
 which would make value range restriction tests break unecessarily.
 if a funciton is not defined for say (char*)0, adding a test that asserts
 certain behaviour for (char*)0 is effectively *extending* the current
 value range to include (char*)0 and then testing the proper implementation
 of this extended case. the outcome of which would be a CRITICAL or a segfault
 though, and with the very exception of g_critical(), *no* glib/gtk function
 implements this behaviour purposefully, compatibly, or in a documented way.
 so such a test would at best be bogus and uneccessary.

I think the main difference here between your point of view and mine is
that I'm seeing the API from the user side, while you see it from the
developer side. I explain myself:

From a developer point of view, it is ok saying that an API function
only works within a concrete range of values and test that it really
works ok within that range. However, in practice, user programs are full
of bugs, which usually means that under certain conditions, they are not
using the APIs as they are suposed to be used. That said, any user would
prefer such API to handle those situations as safely as possible: if I'm
writing a GTK+ application I'd prefer it to safely handle a missuse on
my side and warn me about the issue, than breaking badly due to a
segmentation fault and make me lose all my data ;)

  I'll add here some points supporting Check ;):
 
 ok, adressing them one by one, since i see multiple reasons for not
 using Check ;)
 
[...]
 it's not clear that Check (besides than being an additional dependency in
 itself) fullfils all the portability requirements of glib/gtk+ for these
 cases though.
 
[...]
 - Check may be widely used, but is presented as [...] at the moment only
sporadically maintained 

Re: Gtk+ unit tests (brainstorming)

2006-11-14 Thread Tim Janik
On Thu, 26 Oct 2006, Iago Toral Quiroga wrote:

 - in the common case, test results should be reduced to a single boolean:
  all tests passed vs. at least one test failed
many test frameworks provide means to count and report failing tests
(even automake's standard check:-rule), there's little to no merit to
this functionality though.
having/letting more than one test fail and to continue work in an
unrelated area rapidly leads to confusion about which tests are
supposed to work and which aren't, especially in multi-contributor setups.
figuring whether the right test passed, suddenly requires scanning of
the test logs and remembering the last count of tests that may validly
fail. this defeats the purpose using a single quick make check run to
be confident that one's changes didn't introduce breakage.
as a result, the whole test harness should always either succeed or
be immediately fixed.

 I understand your point, however I still think that being able to get a
 wider report with all the tests failing at a given moment is also
 interesting (for example in a buildbot continuous integration loop, like
 the one being prepared by the build-brigade). Besides, if there is a
 group of people that want to work on fixing bugs at the same time, they
 would need to get a list of tests failing, not only the first one.

well, you can get that to some extend automatically, if you invoke
   $ make -k check
going beyond that would be a bad trade off i think because:
a) tests are primarily in place to ensure certain functionality is
implemented correctly and continues to work;
b) if things break, tests need to be easy to debug. basically, most of the
time a test failes, you have to engage the debugger, read code/docs,
analyse and fix. tests that need forking or are hard to understand
get into the way of this process, so should be avoided.
c) implementation and test code often has dependencies that won't allow
to test beyond the occourance of an error. a simple example is:
  o = object_new();
  ASSERT (o != NULL); /* no point to continue beyond this on error */
  test_function (o);
a more intimidating case is:
  main() {
test_gtk_1();
test_gtk_2();
test_gtk_3();
test_gtk_4();
test_gtk_5();
// ...
  }
if any of those test functions (say test_gtk_3) produces a gtk/glib
error/assertion/warning/critical, the remaining test functions (4, 5, ...)
are likely to fail for bogus reasons because the libraries entered
undefined state.
reports of those subsequent errors (which are likely to be very
misleading) is useless at best and confusing (in terms of what error really
matters) at worst.
yes, forking for each of the test functions works around that (provided
they are as independent of one another as in the example above), but again,
this complicates the test implementation (it's not an easy to understand
test program anymore) and debuggability, i.e. affectes the 2 main
properties of a good test program.

to sum this up, reporting multiple fine grained test failures may have some
benefits, mainly those you outlined. but it comes at a certain cost, i.e.
test code complexity and debugging hinderance which are both important
properties of good test programs.
also, consider that make -k check can still get you reports on multiple
test failures, just at a somewhat lower granularity. in fact, it's just low
enough to avoid bogus reports.
so options face-to-face, adding fork mode when you don't have to (i.e. other
than checking g_error implementation) provides questionable benefits at
significant costs.
that's not an optimal trade off for gtk test programs i'd say, and i'd expect
the same to hold for most other projects.

 - GLib based test programs should never produce a CRITICAL **: or
WARNING **: message and succeed. the reasoning here is that CRITICALs
and WARNINGs are indicators for an invalid program or library state,
anything can follow from this.
since tests are in place to verify correct implementation/operation, an
invalid program state should never be reached. as a consequence, all tests
should upon initialization make CRITICALs and WARNINGs fatal (as if
--g-fatal-warnings was given).

 Maybe you would like to test how the library handles invalid input. For
 example, let's say we have a function that accepts a pointer as
 parameter, I think it is worth knowing if that function handles safely
 the case when that pointer is NULL (if that is a not allowed value for
 that parameter) or if it produces a segmentation fault in that case.

no, it really doesn't make sense to test functions outside the defined
value ranges. that's because when implementing, the only thing you need
to actually care about from an API perspective is: the defined value ranges.
besides that, value rtanges may compatibly be *extended* in future 

Re: Gtk+ unit tests (brainstorming)

2006-11-10 Thread Carl Worth
On Tue, 31 Oct 2006 10:26:41 -0800, Carl Worth wrote:
 On Tue, 31 Oct 2006 15:26:35 +0100 (CET), Tim Janik wrote:
  i.e. using averaging, your numbers include uninteresting outliers
  that can result from scheduling artefacts (like measuring a whole second
  for copying a single pixel), and they hide the interesting information,
  which is the fastest possible performance encountered for your test code.

 If computing an average, it's obviously very important to eliminate
 the slow outliers, because they will otherwise skew it radically. What
 cairo-perf is currently doing for outliers is really cheesy,
 (ignoring a fixed percentage of the slowest results). One thing I
 started on was to do adaptive identification of outliers based on the
  Q3 + 1.5 * IQR rule as discussed here:

   http://en.wikipedia.org/wiki/Outlier

For reference (or curiosity), in cairo's performance suite, I've now
changed the cairo-perf program, (which does show me the performance
for the current cairo revision), to report minimum (and median) times
and it does do the adaptive outlier detection mentioned above.

But when I take two of these reports generated separately and compare
them, I'm still seeing more noise than I'd like to see, (things like a
40% change when I _know_ that nothing in that area has changed).

I think one problem that is happening here is that even though we're
doing many iterations for any given test, we're doing them all right
together so some system-wide condition might affect all of them and
get captured in the summary.

So I've now taken a new approach which is working much better. What
I'm doing now for cairo-perf-diff which does show me the performance
difference between two different revisions of cairo is to save the
raw timing for every iteration of every test. Then, statistics are
generated only just before the comparison. This makes it easy to go
back and append additional data if some of the results look off. This
has several advantages:

 * I can append more data only for tests where the results look bad,
   so that's much faster.

 * I can run fewer iterations in the first place, since I'll be
   appending more later as needed. This makes the whole process much
   faster.

 * Appending data later means that I'm temporally separating runs for
   the same test and library version, so I'm more immune to random
   system-wide disturbances.

 * Also, when re-running the suite with only a small subset of the
   tests, the two versions of the library are compared at very close
   to the same time, so system-wide changes are less likely to make a
   difference in the result.

I'm really happy with the net result now. I don't even bother worrying
about not using my laptop while the performance suite is running
anymore, since it's quick and easy to correct problems later. And when
I see the results, if some of the results looks funny, I re-run just
those tests, and sure enough the goofy stuff just disappears,
(validating my assumption that it was bogus), or it sticks around no
matter how many times I re-run it, (leading me to investigate and
learn about some unexpected performance impact).

And it caches all of those timing samples so it doesn't have to
rebuild or re-run the suite to compare against something it has seen
before, (the fact that git has hashes just sitting there for the
content of every directory made this easy and totally free). The
interface looks like this:

# What's the performance impact of the latest commit?
cairo-perf-diff HEAD

# How has performance changed from 1.2.0 to 1.2.6? from 1.2.6 to now?
cairo-perf-diff 1.2.0 1.2.6
cairo-perf-diff 1.2.6 HEAD

# As above, but force a re-run even though there's cached data:
cairo-perf-diff -f 1.2.6 HEAD

# As above, but only re-run the named tests:
cairo-perf-diff -f 1.2.6 HEAD -- stroke fill

The same ideas could be implemented with any library performance
suite, and with pretty much any revision control system. It is handy
that git makes it easy to easily name ranges of commits. So, if I
wanted a commit-by-commit report of every change that is unique to
some branch, (let's say, whats on HEAD since 1.2 split off), I could
do something like this:

for rev in $(git rev-list 1.2.6..HEAD); do
cairo-perf-diff rev
done

-Carl

PS. Yes, it is embarrassing that no matter what the topic I end up
plugging git eventually.


pgpLSYQbeXKNI.pgp
Description: PGP signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-31 Thread Iago Toral Quiroga
Havoc Pennington wrote:
 Tim Janik wrote:
  
  ah, interesting. could you please explain why you consider it
  such a big win?
  
 
 Without it I think I usually write about 10% coverage, and imagine in my 
 mind that it is 50% or so ;-) I'm guessing this is pretty common.
 
 With it, it was easy to just browse and say OK, this part isn't tested 
 yet, this part is tested too much so we can speed up the tests, etc.
 
 Also, if someone submits a patch with tests, you can see if their tests 
 are even exercising their code.
 
 It just gives you a way to know how well you're doing and see what else 
 needs doing.

Sure! Tim, you can take a look here to see this in practice:
http://gtktests-buildbot.igalia.com/gnomeslave/gtk+/lcov/gtk/index.html

Those are the code coverage results for the tests I developed. As you
browse the files you realize the code that is tested (blue) and the code
that is not (red).  I think this helps with:

   * Realizing which code your tests are actually covering.
   * Designing new tests so they are not redundant.
   * Analyze which execution branches are not tested for a given
interface.
   * Easily check which files have more tests and which ones need more
testing work based on coverage %.

Iago.
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-31 Thread Tim Janik
On Wed, 25 Oct 2006, Federico Mena Quintero wrote:

 On Wed, 2006-10-25 at 17:52 +0200, Tim Janik wrote:

 - GLib based test programs should never produce a CRITICAL **: or
WARNING **: message and succeed.

 Definitely.  I have some simple code in autotestfilechooser.c to catch
 this and fail the tests if we get warnings/criticals.

oh? hm, looking at the code, it doesn't seem to be significantly superior 
to --g-fatal-warnings:
   fatal_mask = g_log_set_always_fatal (G_LOG_FATAL_MASK);
   fatal_mask |= G_LOG_LEVEL_WARNING | G_LOG_LEVEL_CRITICAL;
   g_log_set_always_fatal (fatal_mask);
or am i missing something?
note that the error counting in your code doesn't funciton beyond 1, because
g_log always treats errors as FATAL, that behaviour is not configurable.

 5- simple helper macros to indicate test start/progress/assertions/end.
 (we've at least found these useful to have in Beast.)

 For the file chooser tests, I just return a boolean from the test
 functions, which obviously indicates if the test passed or not.  What
 sort of macros are you envisioning?

just a very few and simple ones that are used to beautify progress 
indication, e.g.:

TSTART(printfargs)  - print info about the test being run
TOK()   - print progress hyphen unconditionally: '-'
TASSERT(cond)   - print progress hyphen if cond==TRUE, abort otherwise
TCHECK(cond)- like TASSERT() but skip hyphen printing
TDONE() - finish test info / progress bar

for simple tests, i just use TSTART/TASSERT/TDONE, but for tests that explore
a wide range of numbers in a loop, a combination of TCHECK/TOK works better
to avoid excess progress indication. e.g.:

   TSTART (unichar isalnum);
   TASSERT (g_unichar_isalnum ('A') == TRUE);   // prints hypen: '-'
   TASSERT (g_unichar_isalnum ('?') == FALSE);  // prints hypen: '-'
   for (uint i = 0; i  100; i++)
 {
   gunichar uc = rand() % (0x100  (i % 24));
   if (i % 2 == 0)
 TOK(); // prints hypen: '-'
   gboolean bb = Unichar::isalnum (uc);
   gboolean gb = g_unichar_isalnum (uc);
   TCHECK (bb == gb);   // silent assertion
 }
   TDONE();

produces:

unichar isalnum: []

if any of the checks fail, all of __FILE__, __LINE__, __PRETTY_FUNCTION__
and cond are printed of course, e.g.:
** ERROR **: strings.cc:270:int main(int, char**)(): assertion failed: 
g_unichar_isalnum ('?') == true

having java-style TASSERT_EQUAL(val1,val2) could probably also be nice,
in order to print the mismatching values with the error message.


 - try setting  getting all widget properties on all widgets over the full
value ranges (sparsely covered by means of random numbers for instance)
 - try setting  getting all container child properties analogously

 I believe the OSDL tests do this already.  We should steal that code;
 there's a *lot* of properties to be tested!

not sure what code you're referring to. but what i meant here is code that
generically queries all properties and explores their value range. this
kind of generic querying code is available in a lot of places already (beast,
gle, gtk prop editor, libglade, gtk-doc, LBs, etc.)


 - for all widget types, create and destroy them in a loop to:
a) measure basic object setup performance
b) catch obvious leaks
(these would be slowcheck/perf tests)

 Yeah.  GtkWidgetProfiler (in gtk+/perf) will help with this.  Manu
 Cornet has improved it a lot for his theme engine torturer, so we should
 use *that* version :)

hm, i really have not much of an idea about what it does beyond exposing
the main window in a loop ;) 
gtk+/perf/README looks interesting though, are there any docs on this
beyond that file?

  Federico

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-31 Thread Tim Janik
On Wed, 25 Oct 2006, Carl Worth wrote:

 On Wed, 25 Oct 2006 12:40:27 -0500, Federico Mena Quintero wrote:

 There are some things I really don't like in cairo's make check
 suite right now:

 2. The tests take forever to link. Each test right now is a separate
   program. I chose this originally so that I could easily execute
   individual tests, (something I still do regularly and still
   require). The problem is that with any change to the library, make
   check goes through the horrifically slow process of using libtool
   to re-link the hundred or so programs. One idea that's been floated
   to fix this is something like a single test program that links with
   the library, and then dlopens each test module (or something like
   this). Nothing like that has been implemented yet.

that won't quite work either, because libtool has to link shared modules
also. and that takes even longer. for beast, that's the sole issue, the
plugins/ dir takes forever to build and forever to install (libtool relinks
upon installation).

to avoid this type of hassle with the test programs, what we've been
doing is basically to put multiple tests into a single programs, i.e.

static void
test_paths (void)
{
   TSTART (Path handling);
   TASSERT (...);
   TDONE();
}
[...]

int
main (int   argc,
   char *argv[])
{
   birnet_init_test (argc, argv);

   test_cpu_info();
   test_paths();
   test_zintern();
   [...]
   test_virtual_typeid();

   return 0;
}

 Something that is worth stealing is some of the work I've been doing
 in make perf in cairo. I've been putting a lot of effort into
 getting the most reliable numbers out of running performance tests,
 (and doing it is quickly as possible yet). I started with stuff that
 was in Manu's torturer with revamping from Benjamin Otte and I've
 further improved it from there.

 Some of the useful stuff is things such as using CPU performance
 counters for measuring time, (which of course I didn't write, but
 just got from liboil---thanks David!), and then some basic statistical
 analysis---such as reporting the average and standard deviation over
 many short runs timed individually, rather than just timing many runs
 as a whole, (which gives the same information as the average, but
 without any indication of how stable the results are from one to the
 next).

we've looked at cairo's perf output the other day, and one thing we really
failed to understand is that you print average values for your test runs.
granted, there might be some limited use to averaging over multiple runs
to have an idea how much time could be consumed by a particular task,
but much more interesting are other numbers.

i.e. using averaging, your numbers include uninteresting outliers
that can result from scheduling artefacts (like measuring a whole second
for copying a single pixel), and they hide the interesting information,
which is the fastest possible performance encountered for your test code.

printing the median over your benchmark runs would give a much better
indication of the to-be-expected average runtime, because outliers
into either direction are essentially ignored that way.

most interesting for benchmarking and optimization however is the minimum
time a specific operation takes, since in machine execution there is a hard
lower limit we're interested in optimizing. and apart from performance
clock skews, there'll never be minimum time measurement anomalies wich
we wanted to ignore.

for beast, we've used a combination of calibration code to figure minimum
test run repetitions and taking measurements minimums, which yields quite
stable and accurate results even in the presence of concurrent background
tasks like project/documentation build processes.


 The statistical stuff could still be improved, (as I described in a
 recent post to performance-list), but I think it is a reasonable
 starting point.

well, apologies if median/minimum printing is simply still on your TODO ;)


 Oh, and my code also takes care to do things like ensuring that the X
 server has actually finished drawing what you asked it to, (I think
 GtkWidgetProfiler does that as well---but perhaps with a different
 approach). My stuff uses a single-pixel XGetImage just before starting
 or stopping the timer.

why exactly is that a good idea (and better than say XSync())?
does the X server implement logic like globally carrying out all
pending/enqueued drawing commands before allowing any image capturing?



 Never forget the truth that Keith Packard likes to share
 often:

   Untested code == Broken code

heh ;)

 -Carl


---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-31 Thread Stefan Kost
Hi Tim,

Tim Janik wrote:
 Hi all.

 as mentioned in another email already, i've recently worked on improving
 unit test integration in Beast and summarized this in my last blog entry:
http://blogs.gnome.org/view/timj/2006/10/23/0 # Beast and unit testing

   
I did a presentation on check base unit tests on last guadec. Here are
the slides:
http://www.buzztard.org/files/guadec2005_advanced_unit_testing.pdf

I have made good experience with check in gstreamer and also in
buzztard. IMHO it does not make sense to write an own test suite. The
unit tests are optional and IHMO its not too mauch to ask for developers
to install check - maybe print a teaser for people that build CVS code
and don't have check installed. And btw. most distros have it.
 while analysing the need for a testing framework and whether it makes sense
 for GLib and Gtk+ to depend on yet another package for the sole purpose of
 testing, i made/had the following observations/thoughts:

 - Unit tests should run fast - a test taking 1/10th of a second is a slow
unit test, i've mentioned this in my blog entry already.
   
In the slides we were talking about the concept of test aspects. You do
positive and negative tests, performance and stress tests. It makes
sense to organize the testsuite to reflect this. It might also make
sense to have one test binary per widget (class). This way you can
easily run single tests. IHMO its not big deal if tests run slow. If you
have a good test coverage, the whole test run will be slow anyway. For
that purpose we have continuous integrations tools like buildbot. That
will happily run your whole testsuite even under valgrind and bug the
developer per IRC/mail/whatever.
 - the important aspect about a unit test is the testing it does, not the
testing framework matter. as such, a testing framework doesn't need to
be big, here is one that is implemented in a whole 4 lines of C source,
it gets this point across very well: ;)
  http://www.jera.com/techinfo/jtns/jtn002.html
   
True, but reinventing the wheel is usually just means repeating errors.
 - in the common case, test results should be reduced to a single boolean:
  all tests passed vs. at least one test failed
many test frameworks provide means to count and report failing tests
(even automake's standard check:-rule), there's little to no merit to
this functionality though.
having/letting more than one test fail and to continue work in an
unrelated area rapidly leads to confusion about which tests are
supposed to work and which aren't, especially in multi-contributor setups.
figuring whether the right test passed, suddenly requires scanning of
the test logs and remembering the last count of tests that may validly
fail. this defeats the purpose using a single quick make check run to
be confident that one's changes didn't introduce breakage.
as a result, the whole test harness should always either succeed or
be immediately fixed.
   
Totally disagree. The whole point in using the fork-based approach
together with setup/teardown hooks is to provide a sane test environment
for each case. When you run the test suite on a build bot, you want to
know about the overall state (percentage of pass/fail) plus a list of
tests that fail, so that you can fix the issues the test uncovered.
GSteamer and many apps based on it use a nice logging frameworks (that
also previously has been offered for glib integration (glog)). The tests
create logs that help to understand the problem.
 - for reasons also mentioned in the afformentioned blog entry it might
be a good idea for Gtk+ as well to split up tests into things that
can quickly be checked, thoroughly be checked but take long, and into
performance/benchmark tests.
these can be executed by make targets check, slowcheck and perf
respectively.

 - for tests that check abort()-like behvaior, it can make sense to fork-off
a test program and check whether it fails in the correct place.
allthough this type of checks are the minority, the basic
fork-functionality shouldn't be reimplemented all over again and warrants
a test utility function.
   
This is available in 'check'.
 - for time bound tasks it can also make sense to fork a test and after
a certain timeout, abort and fail the test.
   
This is available in 'check'.
 - some test suites offer formal setup mechnisms for test sessions.
i fail to see the necessity for this. main() { } provides useful test
grouping just as well, this idea is applied in an example below.
   
See above.
 - multiple tests may need to support the same set of command line arguments
e.g. --test-slow or --test-perf as outlined in the blog entry.
it makes sense to combine this logic in a common test utility function,
usually pretty small.
   
Agree here. The tests should forward their argc/argv (except when thei
test argc/argv handling).
 - homogeneous or consistent test 

Re: Gtk+ unit tests (brainstorming)

2006-10-30 Thread Tim Janik
On Wed, 25 Oct 2006, Havoc Pennington wrote:

 Hi,

 When coding dbus I thought I'd try a project with a focus on unit tests.
 It has (or at least had for a while) exceptionally high test coverage,
 around 75% of basic blocks executed in make check. The coverage-analyzer
 has been busted for a couple years though.

 Here are my thoughts from dbus:

  - the make coverage-report was by far the biggest win I spent time
on.

ah, interesting. could you please explain why you consider it
such a big win?

  - frequently I needed to add extra interfaces or levels of
abstraction to be able to test effectively. For example,
allowing dummy implementations of dependency
module to be inserted underneath a module I was testing.

dbus is heavily conditionalized on a DBUS_BUILD_TESTS
parameter, which allows adding all kinds of test-only code
without fear of bloating the production version. One price
of this is that the tested lib is slightly different from the
production lib.

ah, good to know. though i'd consider that price considerably high for a
project of the size and build time of Gtk+, and where we'd really benefit
from having *many* developers and contributors run make check.
especially, when you have a quite large legacy code base, instead of
developing with conditionalized test hooks from the start.

  - based on how nautilus does unit tests, I put the tests in the file
with the code being tested. The rationale is similar to the
rationale for inline documentation. I think it's a good approach,
but it does require a distinct test build (DBUS_BUILD_TESTS).

sounds interesting as well. the downsize is of course the assorted
file growth, and gtk+ already isn't a particularly good citizen in
terms of loc per file ;)

$ wc -l *.c | sort -r | head
   380899 total
14841 gtktreeview.c
11360 gtkaliasdef.c
 9154 gtkiconview.c
 8764 gtkfilechooserdefault.c
 8632 gtktextview.c
 8060 gtkwidget.c

Another advantage of this is that internal code can be tested, while
it may not be possible to fully exercise internal code using the
public API.

thanks for your insight havoc. i'll definitely look into the coverage
report generation at some later point.

 Havoc

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-30 Thread Havoc Pennington
Tim Janik wrote:
 
 ah, interesting. could you please explain why you consider it
 such a big win?
 

Without it I think I usually write about 10% coverage, and imagine in my 
mind that it is 50% or so ;-) I'm guessing this is pretty common.

With it, it was easy to just browse and say OK, this part isn't tested 
yet, this part is tested too much so we can speed up the tests, etc.

Also, if someone submits a patch with tests, you can see if their tests 
are even exercising their code.

It just gives you a way to know how well you're doing and see what else 
needs doing.

The special gcov replacement that bitrotted in dbus did a couple things:
  - merged results from multiple executables into one report
  - omitted the test harness itself from the report, i.e. without
my special hacks if you have:
 if (test_failed())
   assert_not_reached();
then gcov would count assert_not_reached() as an uncovered block
in the stats.

Maybe some other changes on top of gcov too, I don't remember.

The report had coverage % for the whole project, each directory, and 
each file in one report. And then it spit out the annotated source for 
each file.

Anyone can cobble together this kind of info with enough work, part of 
the point is that --enable-coverage in configure, plus make 
coverage-report makes it so easy to run the report that it happens much 
more often.

Last time I was looking at fixing it I started hacking on a valgrind 
tool as a replacement for trying to keep up with the ever-changing gcov 
file format, but I didn't really work on that beyond posting a 
half-baked initial patch to the valgrind list.

Havoc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-30 Thread mathieu lacage
On Mon, 2006-10-30 at 15:34 +0100, Tim Janik wrote:
 On Wed, 25 Oct 2006, Havoc Pennington wrote:
 
  Hi,
 
  When coding dbus I thought I'd try a project with a focus on unit tests.
  It has (or at least had for a while) exceptionally high test coverage,
  around 75% of basic blocks executed in make check. The coverage-analyzer
  has been busted for a couple years though.
 
  Here are my thoughts from dbus:
 
   - the make coverage-report was by far the biggest win I spent time
 on.
 
 ah, interesting. could you please explain why you consider it
 such a big win?

I use a similar setting in a project of mine and the big win is, for me,
the ability of knowing what needs to be done: it is easy to spot the
locations which are never tested and thus likely to be buggy.

 
   - frequently I needed to add extra interfaces or levels of
 abstraction to be able to test effectively. For example,
 allowing dummy implementations of dependency
 module to be inserted underneath a module I was testing.
 
 dbus is heavily conditionalized on a DBUS_BUILD_TESTS
 parameter, which allows adding all kinds of test-only code
 without fear of bloating the production version. One price
 of this is that the tested lib is slightly different from the
 production lib.
 
 ah, good to know. though i'd consider that price considerably high for a
 project of the size and build time of Gtk+, and where we'd really benefit
 from having *many* developers and contributors run make check.
 especially, when you have a quite large legacy code base, instead of
 developing with conditionalized test hooks from the start.

Usually, these test hooks are always ON for developer builds and only
OFF for release builds. 

   - based on how nautilus does unit tests, I put the tests in the file
 with the code being tested. The rationale is similar to the
 rationale for inline documentation. I think it's a good approach,
 but it does require a distinct test build (DBUS_BUILD_TESTS).
 
 sounds interesting as well. the downsize is of course the assorted
 file growth, and gtk+ already isn't a particularly good citizen in
 terms of loc per file ;)
 
 $ wc -l *.c | sort -r | head
380899 total
 14841 gtktreeview.c
 11360 gtkaliasdef.c
  9154 gtkiconview.c
  8764 gtkfilechooserdefault.c
  8632 gtktextview.c
  8060 gtkwidget.c
 
 Another advantage of this is that internal code can be tested, while
 it may not be possible to fully exercise internal code using the
 public API.
 
 thanks for your insight havoc. i'll definitely look into the coverage
 report generation at some later point.

I think havoc used his own report generation tool. You might want to
give lcov a try: it generates some nice-looking html.

Mathieu

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Andy Wingo
Hi,

On Wed, 2006-10-25 at 17:52 +0200, Tim Janik wrote:
 - GLib based test programs should never produce a CRITICAL **: or
WARNING **: message and succeed.

Sometimes it is useful to check that a critical message was indeed
shown, and then move on. GStreamer installs a log handler that aborts if
criticals are shown unexpectedly, but also has the ability to test that
GStreamer fails appropriately.

For example, here is a test from gstreamer/tests/check/gst/gstobject.c:

/* g_object_new on abstract GstObject should fail */
GST_START_TEST (test_fail_abstract_new)
{
  GstObject *object;

  ASSERT_CRITICAL (object = g_object_new (gst_object_get_type (),
NULL));
  fail_unless (object == NULL, Created an instance of abstract
GstObject);
}

I could 44 uses of this pattern in GStreamer core's test suite, based on
libcheck (but with a wrapper).

Regards,

Andy.
-- 
http://wingolog.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Michael Urman
On 10/25/06, Havoc Pennington [EMAIL PROTECTED] wrote:
 Testing those is like testing segfault handling, i.e. just nuts. The
 behavior is undefined once they print. (Well, for critical anyway.
 g_warning seems to be less consistently used)

Certainly setting out to test all critical cases would not add value
corresponding to the effort; criticals are a different beast I
shouldn't have included. Even for warnings, in certain cases making
error cases testable would slow down real life performance without
benefit. But preemptively deciding it's always impossible to test
resilience of certain known warnings is a misstep. An option like
-Werror is really useful, but hard wiring -Werror is too limiting.

Warnings especially, by not being criticals, imply a contract that the
call will function reasonably (not necessarily correctly) even
during incorrect use. If this is not tested, it will not be correct.

-- 
Michael Urman
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Tim Janik
On Thu, 26 Oct 2006, Michael Urman wrote:

 On 10/25/06, Havoc Pennington [EMAIL PROTECTED] wrote:
 Testing those is like testing segfault handling, i.e. just nuts. The
 behavior is undefined once they print. (Well, for critical anyway.
 g_warning seems to be less consistently used)

 Certainly setting out to test all critical cases would not add value
 corresponding to the effort; criticals are a different beast I
 shouldn't have included. Even for warnings, in certain cases making
 error cases testable would slow down real life performance without
 benefit. But preemptively deciding it's always impossible to test
 resilience of certain known warnings is a misstep. An option like
 -Werror is really useful, but hard wiring -Werror is too limiting.

the analogy doesn't quite hold. and btw, i'm not a fan of -Werror fwiw ;)

 Warnings especially, by not being criticals, imply a contract that the
 call will function reasonably (not necessarily correctly) even
 during incorrect use. If this is not tested, it will not be correct.

this is not quite true for the GLib context. once a program triggers any
of g_assert*(), g_error(), g_warning() or g_critical() (this is nowadays
used for the implementaiton of return_if_fail), the program/library is
in an undefined state. that's because the g_log() error/warning/critical
cases are widely used to report *programming* errors, not user errors
(g_log wasn't initially designed with that scnerio in mind, but this is
how it's uses efefctively turned out).

so for the vast majority of cases, aborting on any such event and fail a
test is the right thing to do.

that doesn't imply no tests should be present at all to test correct
assertions per-se. such tests can be implemented by installing g_log
handlers, reconfiguring the fatality of certain log levels and by
employing fork-ed test mode (i adressed most of these in my original
email to some extend). these kind of tests will be rare though, and
also need to be carefully crafted because specific assertions and checks
can simply be optimized away under certain configurations, so relying
on them is of questionable merit. you might want to read up on
G_DISABLE_ASSERT or G_DISABLE_CAST_CHECKS on this for more details.

also, experience (s/experience/bugzilla/) tells us that the majority
of bugs reported against GLib/Gtk+ is not about missing assertions
or errors, which provides strong hinting what kind of tests you'd
want to put your focus on.

 -- 
 Michael Urman

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Tristan Van Berkom
Havoc Pennington wrote:

Michael Urman wrote:
  

On 10/25/06, Tim Janik [EMAIL PROTECTED] wrote:


- GLib based test programs should never produce a CRITICAL **: or
   WARNING **: message and succeed.
  

It would be good not to make it impossible to test WARNINGs and
CRITICALs. After all, error cases are often the least tested part of
an application, so it's important to make sure the base library
detects and handles the error cases correctly.


Testing those is like testing segfault handling, i.e. just nuts. The 
behavior is undefined once they print. (Well, for critical anyway. 
g_warning seems to be less consistently used)
  

On the other hand - its happened to me several times to have gtk+
crash when setting up properties on an object genericly from glade,
where a simple g_return_if_fail() guard on that public api would have
saved me the trouble of a segfault (sometime one property has no
meaning if another one hasnt been setup yet - in which case a
g_return_if_fail() guard would be appropriate).

Whether or not its really really relevent,  I think that the critical
warnings from a function that wasnt fed the correct arguments,
or are invalid because of the current object state - are part of
the contract of the api, and for whatever thats worth, maybe
worth testing for.

Ummm, while I'm here - I'd also like to say that - (I'm not about to
dig up the quote from the original email but) - I think that there
is some value to reporting all the tests that fail - even after one
of the tests has failed, based on the same principals that you'd
want your compiler to tell you everything that went wrong in its
parse (not just the first error), maybe not all spewed out to the console
by default - but having the whole test results in a log file can
save valuable developer time in some situations.

Cheers,
  -Tristan

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Tim Janik
On Thu, 26 Oct 2006, Tristan Van Berkom wrote:

 Havoc Pennington wrote:

 Michael Urman wrote:


 On 10/25/06, Tim Janik [EMAIL PROTECTED] wrote:


 - GLib based test programs should never produce a CRITICAL **: or
   WARNING **: message and succeed.


 It would be good not to make it impossible to test WARNINGs and
 CRITICALs. After all, error cases are often the least tested part of
 an application, so it's important to make sure the base library
 detects and handles the error cases correctly.


 Testing those is like testing segfault handling, i.e. just nuts. The
 behavior is undefined once they print. (Well, for critical anyway.
 g_warning seems to be less consistently used)


 On the other hand - its happened to me several times to have gtk+
 crash when setting up properties on an object genericly from glade,

please make sure to file these as bug reports against Gtk+.
gtk shouln't crash if properties are set within valid ranges.

 where a simple g_return_if_fail() guard on that public api would have
 saved me the trouble of a segfault

i reapet, relying on g_return_if_fail() for production code is at beast
futile and at worst dangerous. it might be defined to a NOP.

 (sometime one property has no
 meaning if another one hasnt been setup yet - in which case a
 g_return_if_fail() guard would be appropriate).

wrong, some proeprty values are intentionally set up to support 
free-order settings. e.g. foo=5; foo-set=true; or foo-set=true; foo=5;
every order restriction or property dependency that is introduced in the
proeprty API makes generic handling of properties harder or sometimes
impossible so should be avoided at all costs.
(examples for generic property handling code are language bindings,
property editors, glade, gtk-doc)

 Whether or not its really really relevent,  I think that the critical
 warnings from a function that wasnt fed the correct arguments,
 or are invalid because of the current object state - are part of
 the contract of the api, and for whatever thats worth, maybe
 worth testing for.

no, they are not. functions are simply just defined within the range
specified by the implementor/designer, and in this function for a lot
of glib/gtk functions, i'm informing you that occasional return_if_fail
statements that you may or not may happen to find in glib/gtk code base
are a pure convenience tool to catch programming mistakes easier.

--- that being said, please read on ;)

the above is intentionally worded conservatively, e.g. to allow removal
of return_if_fail statements in the future, so value ranges a function
operates on can be extended *compatibly*. in practice however, we're
doing our best to add missing warning/checks that help programmers
and users to code and use our libraries in the most correct and
reliable way. but by definition, a library/program is broken once a
warninig/critical/assertion/error has been triggered and there is no
point in verifying that these are in fact triggred. i.e. you'd
essentially try to verify that you can make the library/program enter
an undefined state by a certain sequence of usage patterns, and that
is rather pointless.

 Ummm, while I'm here - I'd also like to say that - (I'm not about to
 dig up the quote from the original email but) - I think that there
 is some value to reporting all the tests that fail - even after one
 of the tests has failed, based on the same principals that you'd
 want your compiler to tell you everything that went wrong in its
 parse (not just the first error),

well, try to get your compiler tell you about all compilation errors
in a project covering multiple files or directories if you broke a
multitude of files and in different directories/libraries. it simply
won't work for the vast majority of cases, and also isn't useful in
practice. similar to unit tests, most times a compiler yields errors,
you have to fix the first one before being able to move on.

 maybe not all spewed out to the console
 by default - but having the whole test results in a log file can
 save valuable developer time in some situations.

i don't quite see the scenrio here, so it might be good if you cared
to elaborate this. as described in my initial email, if anything breaks,
it either shouldn't be under test or should be fixed, and i provided
reasonings for why that is the case. if you really mean to challenge
this approach, please also make an effort to adress the reasoning
i provided.

 Cheers,
  -Tristan

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Iago Toral Quiroga
El mié, 25-10-2006 a las 17:52 +0200, Tim Janik escribió:

 - Unit tests should run fast - a test taking 1/10th of a second is a slow
unit test, i've mentioned this in my blog entry already.

Sure, very important, or otherwise developers will tend to neither use
nor maintain the tests.

 - in the common case, test results should be reduced to a single boolean:
  all tests passed vs. at least one test failed
many test frameworks provide means to count and report failing tests
(even automake's standard check:-rule), there's little to no merit to
this functionality though.
having/letting more than one test fail and to continue work in an
unrelated area rapidly leads to confusion about which tests are
supposed to work and which aren't, especially in multi-contributor setups.
figuring whether the right test passed, suddenly requires scanning of
the test logs and remembering the last count of tests that may validly
fail. this defeats the purpose using a single quick make check run to
be confident that one's changes didn't introduce breakage.
as a result, the whole test harness should always either succeed or
be immediately fixed.

I understand your point, however I still think that being able to get a
wider report with all the tests failing at a given moment is also
interesting (for example in a buildbot continuous integration loop, like
the one being prepared by the build-brigade). Besides, if there is a
group of people that want to work on fixing bugs at the same time, they
would need to get a list of tests failing, not only the first one.

 - for reasons also mentioned in the afformentioned blog entry it might
be a good idea for Gtk+ as well to split up tests into things that
can quickly be checked, thoroughly be checked but take long, and into
performance/benchmark tests.
these can be executed by make targets check, slowcheck and perf
respectively.

Yes, seems an excellent idea to me.

 - homogeneous or consistent test output might be desirable in some contexts.

Yes, it is an important point when thinking about a continuous
integration tool for Gnome. If tests for all modules in Gnome agree on a
common output format, then that data can be collected, processed and
presented by a continuous integration tool like buildbot and would make
it easy to do cool things with those test results. In the build-brigade
we had also talked a bit on this subject.

so far, i've made the experience that for simple make check runs, the most
important things are that it's fast enough for people to run frequently
and that it succeeds.
if somewhat slowly perceived parts are hard to avoid, a progress indicator
can help a lot to overcome the required waiting time. so, here the exact
oputput isn't too important as long as some progress is displayed.

Yes, good point. 

 - GLib based test programs should never produce a CRITICAL **: or
WARNING **: message and succeed. the reasoning here is that CRITICALs
and WARNINGs are indicators for an invalid program or library state,
anything can follow from this.
since tests are in place to verify correct implementation/operation, an
invalid program state should never be reached. as a consequence, all tests
should upon initialization make CRITICALs and WARNINGs fatal (as if
--g-fatal-warnings was given).

Maybe you would like to test how the library handles invalid input. For
example, let's say we have a function that accepts a pointer as
parameter, I think it is worth knowing if that function handles safely
the case when that pointer is NULL (if that is a not allowed value for
that parameter) or if it produces a segmentation fault in that case. 

 as far as a testing framework is needed for GLib/Gtk+, i think it would
 be sufficient to have a pair of common testutils.[hc] files that provide:
 
 1- an initialization function that calls gtk_init() and preparses
 arguments relevant for test programs. this should also make all WARNINGs
 and CRITICALs fatal.
 
 2- a function to register all widget types provided by Gtk+, (useful for
 automated testing).
 
 3- a function to fork off a test and assert it fails in the expected place
 (around a certain statement).
 
 4- it may be helpful to have a fork-off and timeout helper function as well.
 
 5- simple helper macros to indicate test start/progress/assertions/end.
 (we've at least found these useful to have in Beast.)
 
 6- output formatting functions to consistently present performance 
 measurements
 in a machine parsable manner.
 
 
 if i'm not mistaken, test frameworks like Check would only help us out with
 3, 4 and to some extend 5. i don't think this warrants a new package
 dependency, especially since 5 might be highly customized and 3 or 4 could be
 useful to provide generally in GLib.
 

I'll add here some points supporting Check ;):

As I said in another email, it wouldn't be a dependency 

Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Tristan Van Berkom
Tim Janik wrote:

 On Thu, 26 Oct 2006, Tristan Van Berkom wrote:

 Havoc Pennington wrote:

 Michael Urman wrote:


 On 10/25/06, Tim Janik [EMAIL PROTECTED] wrote:


 - GLib based test programs should never produce a CRITICAL **: or
   WARNING **: message and succeed.


 It would be good not to make it impossible to test WARNINGs and
 CRITICALs. After all, error cases are often the least tested part of
 an application, so it's important to make sure the base library
 detects and handles the error cases correctly.


 Testing those is like testing segfault handling, i.e. just nuts. The
 behavior is undefined once they print. (Well, for critical anyway.
 g_warning seems to be less consistently used)


 On the other hand - its happened to me several times to have gtk+
 crash when setting up properties on an object genericly from glade,


 please make sure to file these as bug reports against Gtk+.
 gtk shouln't crash if properties are set within valid ranges.

Yes siree, usually if not always - I do.

 where a simple g_return_if_fail() guard on that public api would have
 saved me the trouble of a segfault


 i reapet, relying on g_return_if_fail() for production code is at beast
 futile and at worst dangerous. it might be defined to a NOP.

Yes, my point here is only that having a warning for misusing the api,
particularly in /development/ code - should be part of the contract of
the api - maybe it doesnt make sence to test for it... fuzzy.

 (sometime one property has no
 meaning if another one hasnt been setup yet - in which case a
 g_return_if_fail() guard would be appropriate).


 wrong, some proeprty values are intentionally set up to support 
 free-order settings. e.g. foo=5; foo-set=true; or foo-set=true; foo=5;
 every order restriction or property dependency that is introduced in the
 proeprty API makes generic handling of properties harder or sometimes
 impossible so should be avoided at all costs.
 (examples for generic property handling code are language bindings,
 property editors, glade, gtk-doc)

I think thats a little uncooperative - wouldnt you say on the other hand 
- that
the independance of property ordering - aside from possible real weird 
corner
cases - should be promoted as much as possible to make life easier for 
language
binding writers, property editors and such ?

It is ofcourse a two-way street in the sence that one should not be 
mislead and
not program defensively in this aspect.

 Whether or not its really really relevent,  I think that the critical
 warnings from a function that wasnt fed the correct arguments,
 or are invalid because of the current object state - are part of
 the contract of the api, and for whatever thats worth, maybe
 worth testing for.


 no, they are not. functions are simply just defined within the range
 specified by the implementor/designer, and in this function for a lot
 of glib/gtk functions, i'm informing you that occasional return_if_fail
 statements that you may or not may happen to find in glib/gtk code base
 are a pure convenience tool to catch programming mistakes easier.

 --- that being said, please read on ;)

 the above is intentionally worded conservatively, e.g. to allow removal
 of return_if_fail statements in the future, so value ranges a function
 operates on can be extended *compatibly*. in practice however, we're
 doing our best to add missing warning/checks that help programmers
 and users to code and use our libraries in the most correct and
 reliable way. but by definition, a library/program is broken once a
 warninig/critical/assertion/error has been triggered and there is no
 point in verifying that these are in fact triggred. i.e. you'd
 essentially try to verify that you can make the library/program enter
 an undefined state by a certain sequence of usage patterns, and that
 is rather pointless.

Yes I see, saying whether it is part of the contract may sound binding,
but an effort to report api misuse is definitly in need. Also adding
a g_return_if_fail() to your function should probably not constitute an
addition to your test suite (i.e. you'd usually progressively add tests
to ensure that your api complains properly when misused - well,
I guess its a lot of effort for a low yield).

 Ummm, while I'm here - I'd also like to say that - (I'm not about to
 dig up the quote from the original email but) - I think that there
 is some value to reporting all the tests that fail - even after one
 of the tests has failed, based on the same principals that you'd
 want your compiler to tell you everything that went wrong in its
 parse (not just the first error),


 well, try to get your compiler tell you about all compilation errors
 in a project covering multiple files or directories if you broke a
 multitude of files and in different directories/libraries. it simply
 won't work for the vast majority of cases, and also isn't useful in
 practice. similar to unit tests, most times a compiler yields errors,
 you have to fix the first 

Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Bill Haneman
Hi:

   
 - homogeneous or consistent test output might be desirable in some contexts.
 

 Yes, it is an important point when thinking about a continuous
 integration tool for Gnome. If tests for all modules in Gnome agree on a
 common output format, then that data can be collected, processed and
 presented by a continuous integration tool like buildbot and would make
 it easy to do cool things with those test results. In the build-brigade
 we had also talked a bit on this subject.
   
While I agree that for gtk+, a lightweight unit test framework with 
limited dependencies makes sense, for gnome as a whole, at the Boston 
summit we proposed a test tinderbox system for Gnome based on 
dogtail/LDTP or a similar system, using AT-SPI as the framework. 

This would not only be able to test (and integration test) individual 
components and the Gnome stack, it would also allow automated testing of 
accessibility support and allow early detection of regressions in these 
areas (which have been major problems throughout the Gnome 2.X series).

See this thread: 
http://mail.gnome.org/archives/dogtail-devel-list/2006-October/msg00011.html
regarding the creation of [EMAIL PROTECTED]

regards

Bill


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-25 Thread Federico Mena Quintero
On Wed, 2006-10-25 at 17:52 +0200, Tim Janik wrote:

 while analysing the need for a testing framework and whether it makes sense
 for GLib and Gtk+ to depend on yet another package for the sole purpose of
 testing, i made/had the following observations/thoughts:

Wooo!  Thanks for working on this, Tim; it would be *fantastic* to unify
all the disparate tests that we have right now, as well as write new
ones.

 - for reasons also mentioned in the afformentioned blog entry it might
be a good idea for Gtk+ as well to split up tests into things that
can quickly be checked, thoroughly be checked but take long, and into
performance/benchmark tests.
these can be executed by make targets check, slowcheck and perf
respectively.

This is a good idea.  For testing the file chooser, it helped a lot to
write some black box tests which are essentially smoke tests.  They
just test the very basic functionality of the chooser, without going
into the whole test suite.  The black box tests helped me fix a bug with
many ramifications quickly, since at all times I could ensure that none
of the basic functionality was broken.

Performance tests should definitely be separate.

 - for time bound tasks it can also make sense to fork a test and after
a certain timeout, abort and fail the test.

Yes.  Hans Petter uses this approach for hiw Flow testing, and it's very
cool.

 - homogeneous or consistent test output might be desirable in some contexts.
so far, i've made the experience that for simple make check runs, the most
important things are that it's fast enough for people to run frequently
and that it succeeds.

One thing about this... it would be good to keep megawidgets in separate
tests, but then to have make check run all the tests for all
megawidgets.  That is, I'd like a single
test-all-the-stuff-in-the-treeview and a single
test-all-the-stuff-in-the-file-chooser:  when coding in one, I
normally don't care about the other one.

So if all the test-* programs spit the same output, we can collect it
easily from make check.

Cairo has a very nice testing framework.  We can surely steal ideas from
it.

 - GLib based test programs should never produce a CRITICAL **: or
WARNING **: message and succeed.

Definitely.  I have some simple code in autotestfilechooser.c to catch
this and fail the tests if we get warnings/criticals.

 5- simple helper macros to indicate test start/progress/assertions/end.
 (we've at least found these useful to have in Beast.)

For the file chooser tests, I just return a boolean from the test
functions, which obviously indicates if the test passed or not.  What
sort of macros are you envisioning?

 - for a specific widget type, test input/output conditions of all API
functions (only for valid use cases though)
 - similarly, test all input/output conditions of the Gdk API
 - try setting  getting all widget properties on all widgets over the full
value ranges (sparsely covered by means of random numbers for instance)
 - try setting  getting all container child properties analogously

I believe the OSDL tests do this already.  We should steal that code;
there's a *lot* of properties to be tested!

 - check layout algorithms by layouting a child widget that does nothing but
checking the coordinates it's layed out at.

Oh, nice idea.  This would be *really* nice even for non-widget
stuff, such as all the tricky layout going on in GtkTreeView and its
cell renderers.

 - generically query all key bindings of stock Gtk+ widgets, and activate them,
checking that no warnings/criticals are generated.

Nice.

 - for all widget types, create and destroy them in a loop to:
a) measure basic object setup performance
b) catch obvious leaks
(these would be slowcheck/perf tests)

Yeah.  GtkWidgetProfiler (in gtk+/perf) will help with this.  Manu
Cornet has improved it a lot for his theme engine torturer, so we should
use *that* version :)

 as always, feedback is appreciated, especially objections/concerns
 regarding the ideas outlined ;)

Again, thanks for working on this.  It will be reasurring to have a good
way to plug in new tests as we add new code to gtk+.  After the
infrastructure in autotestfilechooser was done, it was quite fun to add
new tests; before that, it was extremely cumbersome to start with a
blank sheet of paper every time I wanted to test something.

  Federico

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-25 Thread Michael Urman
On 10/25/06, Tim Janik [EMAIL PROTECTED] wrote:
 - GLib based test programs should never produce a CRITICAL **: or
WARNING **: message and succeed.

It would be good not to make it impossible to test WARNINGs and
CRITICALs. After all, error cases are often the least tested part of
an application, so it's important to make sure the base library
detects and handles the error cases correctly.

-- 
Michael Urman
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list