Gtk+ unit tests (brainstorming)

2006-10-25 Thread Tim Janik
Hi all.

as mentioned in another email already, i've recently worked on improving
unit test integration in Beast and summarized this in my last blog entry:
   http://blogs.gnome.org/view/timj/2006/10/23/0 # Beast and unit testing


while analysing the need for a testing framework and whether it makes sense
for GLib and Gtk+ to depend on yet another package for the sole purpose of
testing, i made/had the following observations/thoughts:

- Unit tests should run fast - a test taking 1/10th of a second is a slow
   unit test, i've mentioned this in my blog entry already.

- the important aspect about a unit test is the testing it does, not the
   testing framework matter. as such, a testing framework doesn't need to
   be big, here is one that is implemented in a whole 4 lines of C source,
   it gets this point across very well: ;)
 http://www.jera.com/techinfo/jtns/jtn002.html

- in the common case, test results should be reduced to a single boolean:
 "all tests passed" vs. "at least one test failed"
   many test frameworks provide means to count and report failing tests
   (even automake's standard check:-rule), there's little to no merit to
   this functionality though.
   having/letting more than one test fail and to continue work in an
   unrelated area rapidly leads to confusion about which tests are
   supposed to work and which aren't, especially in multi-contributor setups.
   figuring whether the right test passed, suddenly requires scanning of
   the test logs and remembering the last count of tests that may validly
   fail. this defeats the purpose using a single quick make check run to
   be confident that one's changes didn't introduce breakage.
   as a result, the whole test harness should always either succeed or
   be immediately fixed.

- for reasons also mentioned in the afformentioned blog entry it might
   be a good idea for Gtk+ as well to split up tests into things that
   can quickly be checked, thoroughly be checked but take long, and into
   performance/benchmark tests.
   these can be executed by make targets check, slowcheck and perf
   respectively.

- for tests that check abort()-like behvaior, it can make sense to fork-off
   a test program and check whether it fails in the correct place.
   allthough this type of checks are the minority, the basic
   fork-functionality shouldn't be reimplemented all over again and warrants
   a test utility function.

- for time bound tasks it can also make sense to fork a test and after
   a certain timeout, abort and fail the test.

- some test suites offer formal setup mechnisms for test "sessions".
   i fail to see the necessity for this. main() { } provides useful test
   grouping just as well, this idea is applied in an example below.

- multiple tests may need to support the same set of command line arguments
   e.g. --test-slow or --test-perf as outlined in the blog entry.
   it makes sense to combine this logic in a common test utility function,
   usually pretty small.

- homogeneous or consistent test output might be desirable in some contexts.
   so far, i've made the experience that for simple make check runs, the most
   important things are that it's fast enough for people to run frequently
   and that it succeeds.
   if somewhat slowly perceived parts are hard to avoid, a progress indicator
   can help a lot to overcome the required waiting time. so, here the exact
   oputput isn't too important as long as some progress is displayed.
   for performance measurements it makes sense to use somewhat canonical
   output formats though (ideally machine parsable) and it can simplify the
   test implementations if performance results may be intermixed with existing
   test outputs (such as progress indicators).
   i've mentioned this in my blog entry as well, it boils down to using a
   small set of utility funcitons to format machine-detectable performance
   test result output.

- GLib based test programs should never produce a "CRITICAL **:" or
   "WARNING **:" message and succeed. the reasoning here is that CRITICALs
   and WARNINGs are indicators for an invalid program or library state,
   anything can follow from this.
   since tests are in place to verify correct implementation/operation, an
   invalid program state should never be reached. as a consequence, all tests
   should upon initialization make CRITICALs and WARNINGs fatal (as if
   --g-fatal-warnings was given).

- test programs should be good glib citizens by definineg G_LOG_DOMAIN, so
   WARNING, CRITICAL, and ERROR printouts can correctly indicate the failing
   component. since multiple test programs usually go into the same directory,
   something like DEFS += -DG_LOG_DOMAIN='"$(basename $(@F))"' (for GNU make)
   or DEFS += -DG_LOG_DOMAIN='"$@"' (for portable makefiles) needs to be used.


as far as a "testing framework" is needed for GLib/Gtk+, i think it would
be sufficient to have a pair of common testutils.[hc] files that provide:

1- an initialization 

Re: Gtk+ unit tests (brainstorming)

2006-10-25 Thread Havoc Pennington
Hi,

When coding dbus I thought I'd try a project with a focus on unit tests.
It has (or at least had for a while) exceptionally high test coverage, 
around 75% of basic blocks executed in make check. The coverage-analyzer 
has been busted for a couple years though.

Here are my thoughts from dbus:

  - the "make coverage-report" was by far the biggest win I spent time
on. It gave percentage basic blocks tested, globally, by module,
and by file. Then for each file, it did the gcov-style annotation
to show what was not covered.

So when coding, you could make coverage report, then see what you
had failed to test. Also, if you just wanted to work on improving
test coverage, you could use make coverage-report to find
areas to work on.

the coverage-report target simply depended on make check, so it
was only a single command to run all the tests and look at
their coverage in a helpful format.

  - frequently I needed to add extra interfaces or levels of
abstraction to be able to test effectively. For example,
allowing "dummy" implementations of dependency
module to be inserted underneath a module I was testing.

dbus is heavily conditionalized on a DBUS_BUILD_TESTS
parameter, which allows adding all kinds of test-only code
without fear of bloating the production version. One price
of this is that the tested lib is slightly different from the
production lib.

  - based on how nautilus does unit tests, I put the tests in the file
with the code being tested. The rationale is similar to the
rationale for inline documentation. I think it's a good approach,
but it does require a distinct "test build" (DBUS_BUILD_TESTS).

Another advantage of this is that internal code can be tested, while
it may not be possible to fully exercise internal code using the
public API.

Havoc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-25 Thread Federico Mena Quintero
On Wed, 2006-10-25 at 17:52 +0200, Tim Janik wrote:

> while analysing the need for a testing framework and whether it makes sense
> for GLib and Gtk+ to depend on yet another package for the sole purpose of
> testing, i made/had the following observations/thoughts:

Wooo!  Thanks for working on this, Tim; it would be *fantastic* to unify
all the disparate tests that we have right now, as well as write new
ones.

> - for reasons also mentioned in the afformentioned blog entry it might
>be a good idea for Gtk+ as well to split up tests into things that
>can quickly be checked, thoroughly be checked but take long, and into
>performance/benchmark tests.
>these can be executed by make targets check, slowcheck and perf
>respectively.

This is a good idea.  For testing the file chooser, it helped a lot to
write some "black box tests" which are essentially smoke tests.  They
just test the very basic functionality of the chooser, without going
into the whole test suite.  The black box tests helped me fix a bug with
many ramifications quickly, since at all times I could ensure that none
of the basic functionality was broken.

Performance tests should definitely be separate.

> - for time bound tasks it can also make sense to fork a test and after
>a certain timeout, abort and fail the test.

Yes.  Hans Petter uses this approach for hiw Flow testing, and it's very
cool.

> - homogeneous or consistent test output might be desirable in some contexts.
>so far, i've made the experience that for simple make check runs, the most
>important things are that it's fast enough for people to run frequently
>and that it succeeds.

One thing about this... it would be good to keep megawidgets in separate
tests, but then to have "make check" run all the tests for all
megawidgets.  That is, I'd like a single
"test-all-the-stuff-in-the-treeview" and a single
"test-all-the-stuff-in-the-file-chooser":  when coding in one, I
normally don't care about the other one.

So if all the test-* programs spit the same output, we can collect it
easily from "make check".

Cairo has a very nice testing framework.  We can surely steal ideas from
it.

> - GLib based test programs should never produce a "CRITICAL **:" or
>"WARNING **:" message and succeed.

Definitely.  I have some simple code in autotestfilechooser.c to catch
this and fail the tests if we get warnings/criticals.

> 5- simple helper macros to indicate test start/progress/assertions/end.
> (we've at least found these useful to have in Beast.)

For the file chooser tests, I just return a boolean from the test
functions, which obviously indicates if the test passed or not.  What
sort of macros are you envisioning?

> - for a specific widget type, test input/output conditions of all API
>functions (only for valid use cases though)
> - similarly, test all input/output conditions of the Gdk API
> - try setting & getting all widget properties on all widgets over the full
>value ranges (sparsely covered by means of random numbers for instance)
> - try setting & getting all container child properties analogously

I believe the OSDL tests do this already.  We should steal that code;
there's a *lot* of properties to be tested!

> - check layout algorithms by layouting a child widget that does nothing but
>checking the coordinates it's layed out at.

Oh, nice idea.  This would be *really* nice even for non-widget
stuff, such as all the tricky layout going on in GtkTreeView and its
cell renderers.

> - generically query all key bindings of stock Gtk+ widgets, and activate them,
>checking that no warnings/criticals are generated.

Nice.

> - for all widget types, create and destroy them in a loop to:
>a) measure basic object setup performance
>b) catch obvious leaks
>(these would be slowcheck/perf tests)

Yeah.  GtkWidgetProfiler (in gtk+/perf) will help with this.  Manu
Cornet has improved it a lot for his theme engine torturer, so we should
use *that* version :)

> as always, feedback is appreciated, especially objections/concerns
> regarding the ideas outlined ;)

Again, thanks for working on this.  It will be reasurring to have a good
way to plug in new tests as we add new code to gtk+.  After the
infrastructure in autotestfilechooser was done, it was quite fun to add
new tests; before that, it was extremely cumbersome to start with a
blank sheet of paper every time I wanted to test something.

  Federico

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-25 Thread Carl Worth
On Wed, 25 Oct 2006 12:40:27 -0500, Federico Mena Quintero wrote:
> Cairo has a very nice testing framework.  We can surely steal ideas from
> it.

I don't know that cairo's "make check" stuff has much worth
stealing directly. But I will share some experiences at least.

There are some things I really don't like in cairo's "make check"
suite right now:

1. We are exporting a (very few, very tiny) test-specific functions in
   the library just so the test suite can get at it. These are
   explicitly not "part of" the API, but they do exist. They're mostly
   on the order of flipping an internal bit to change the behavior to
   exercise some specific piece of the library. But this approach
   really doesn't scale. It would probably be better to do this kind
   of thing in a "test build" as Havoc recently mentioned, (cairo
   currently doesn't have such a build).

2. The tests take forever to link. Each test right now is a separate
   program. I chose this originally so that I could easily execute
   individual tests, (something I still do regularly and still
   require). The problem is that with any change to the library, "make
   check" goes through the horrifically slow process of using libtool
   to re-link the hundred or so programs. One idea that's been floated
   to fix this is something like a single test program that links with
   the library, and then dlopens each test module (or something like
   this). Nothing like that has been implemented yet.

We do have log files and HTML output, both of which are very useful,
but the format of each is very ad-hoc and nothing that I would
recommend anybody copying. That stuff is also very strongly oriented
toward tests which can be verified by "visual diff" of two
images---this works well for cairo, of course, but wouldn't generalize
well.

We also did some work on hooking gcov up to our makefiles and getting
lovely HTML reports of how much coverage their is in the test
suite. As Havoc points out, this is really important, (but sadly, we
haven't yet made a good effort at using this data and writing directed
tests to improve the coverage). I'm not even 100% sure the gcov stuff
in the Makefiles works right now, but it might still be useful for
someone looking to how to start that kind of thing, (though the gcov
documentation might be even better).

> > - for all widget types, create and destroy them in a loop to:
> >a) measure basic object setup performance
> >b) catch obvious leaks
> >(these would be slowcheck/perf tests)
>
> Yeah.  GtkWidgetProfiler (in gtk+/perf) will help with this.  Manu
> Cornet has improved it a lot for his theme engine torturer, so we should
> use *that* version :)

Something that is worth stealing is some of the work I've been doing
in "make perf" in cairo. I've been putting a lot of effort into
getting the most reliable numbers out of running performance tests,
(and doing it is quickly as possible yet). I started with stuff that
was in Manu's torturer with revamping from Benjamin Otte and I've
further improved it from there.

Some of the useful stuff is things such as using CPU performance
counters for measuring "time", (which of course I didn't write, but
just got from liboil---thanks David!), and then some basic statistical
analysis---such as reporting the average and standard deviation over
many short runs timed individually, rather than just timing many runs
as a whole, (which gives the same information as the average, but
without any indication of how stable the results are from one to the
next).

The statistical stuff could still be improved, (as I described in a
recent post to performance-list), but I think it is a reasonable
starting point.

Oh, and my code also takes care to do things like ensuring that the X
server has actually finished drawing what you asked it to, (I think
GtkWidgetProfiler does that as well---but perhaps with a different
approach). My stuff uses a single-pixel XGetImage just before starting
or stopping the timer.

> > as always, feedback is appreciated, especially objections/concerns
> > regarding the ideas outlined ;)

Excellent stuff. Testing is really important and often
neglected. Never forget the truth that Keith Packard likes to share
often:

Untested code == Broken code

-Carl


pgpCKu4i5Axnp.pgp
Description: PGP signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-25 Thread Michael Urman
On 10/25/06, Tim Janik <[EMAIL PROTECTED]> wrote:
> - GLib based test programs should never produce a "CRITICAL **:" or
>"WARNING **:" message and succeed.

It would be good not to make it impossible to test WARNINGs and
CRITICALs. After all, error cases are often the least tested part of
an application, so it's important to make sure the base library
detects and handles the error cases correctly.

-- 
Michael Urman
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-25 Thread Havoc Pennington
Michael Urman wrote:
> On 10/25/06, Tim Janik <[EMAIL PROTECTED]> wrote:
>> - GLib based test programs should never produce a "CRITICAL **:" or
>>"WARNING **:" message and succeed.
> 
> It would be good not to make it impossible to test WARNINGs and
> CRITICALs. After all, error cases are often the least tested part of
> an application, so it's important to make sure the base library
> detects and handles the error cases correctly.
> 

Testing those is like testing segfault handling, i.e. just nuts. The 
behavior is undefined once they print. (Well, for critical anyway. 
g_warning seems to be less consistently used)

Havoc
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Andy Wingo
Hi,

On Wed, 2006-10-25 at 17:52 +0200, Tim Janik wrote:
> - GLib based test programs should never produce a "CRITICAL **:" or
>"WARNING **:" message and succeed.

Sometimes it is useful to check that a critical message was indeed
shown, and then move on. GStreamer installs a log handler that aborts if
criticals are shown unexpectedly, but also has the ability to test that
GStreamer fails appropriately.

For example, here is a test from gstreamer/tests/check/gst/gstobject.c:

/* g_object_new on abstract GstObject should fail */
GST_START_TEST (test_fail_abstract_new)
{
  GstObject *object;

  ASSERT_CRITICAL (object = g_object_new (gst_object_get_type (),
NULL));
  fail_unless (object == NULL, "Created an instance of abstract
GstObject");
}

I could 44 uses of this pattern in GStreamer core's test suite, based on
libcheck (but with a wrapper).

Regards,

Andy.
-- 
http://wingolog.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Michael Urman
On 10/25/06, Havoc Pennington <[EMAIL PROTECTED]> wrote:
> Testing those is like testing segfault handling, i.e. just nuts. The
> behavior is undefined once they print. (Well, for critical anyway.
> g_warning seems to be less consistently used)

Certainly setting out to test all critical cases would not add value
corresponding to the effort; criticals are a different beast I
shouldn't have included. Even for warnings, in certain cases making
error cases testable would slow down real life performance without
benefit. But preemptively deciding it's always impossible to test
resilience of certain known warnings is a misstep. An option like
-Werror is really useful, but hard wiring -Werror is too limiting.

Warnings especially, by not being criticals, imply a contract that the
call will function reasonably (not necessarily "correctly") even
during incorrect use. If this is not tested, it will not be correct.

-- 
Michael Urman
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Tim Janik
On Thu, 26 Oct 2006, Michael Urman wrote:

> On 10/25/06, Havoc Pennington <[EMAIL PROTECTED]> wrote:
>> Testing those is like testing segfault handling, i.e. just nuts. The
>> behavior is undefined once they print. (Well, for critical anyway.
>> g_warning seems to be less consistently used)
>
> Certainly setting out to test all critical cases would not add value
> corresponding to the effort; criticals are a different beast I
> shouldn't have included. Even for warnings, in certain cases making
> error cases testable would slow down real life performance without
> benefit. But preemptively deciding it's always impossible to test
> resilience of certain known warnings is a misstep. An option like
> -Werror is really useful, but hard wiring -Werror is too limiting.

the analogy doesn't quite hold. and btw, i'm not a fan of -Werror fwiw ;)

> Warnings especially, by not being criticals, imply a contract that the
> call will function reasonably (not necessarily "correctly") even
> during incorrect use. If this is not tested, it will not be correct.

this is not quite true for the GLib context. once a program triggers any
of g_assert*(), g_error(), g_warning() or g_critical() (this is nowadays
used for the implementaiton of return_if_fail), the program/library is
in an undefined state. that's because the g_log() error/warning/critical
cases are widely used to report *programming* errors, not user errors
(g_log wasn't initially designed with that scnerio in mind, but this is
how it's uses efefctively turned out).

so for the vast majority of cases, aborting on any such event and fail a
test is the right thing to do.

that doesn't imply no tests should be present at all to test correct
assertions per-se. such tests can be implemented by installing g_log
handlers, reconfiguring the fatality of certain log levels and by
employing fork-ed test mode (i adressed most of these in my original
email to some extend). these kind of tests will be rare though, and
also need to be carefully crafted because specific assertions and checks
can simply be optimized away under certain configurations, so relying
on them is of questionable merit. you might want to read up on
G_DISABLE_ASSERT or G_DISABLE_CAST_CHECKS on this for more details.

also, experience (s/experience/bugzilla/) tells us that the majority
of bugs reported against GLib/Gtk+ is not about missing assertions
or errors, which provides strong hinting what kind of tests you'd
want to put your focus on.

> -- 
> Michael Urman

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Tristan Van Berkom
Havoc Pennington wrote:

>Michael Urman wrote:
>  
>
>>On 10/25/06, Tim Janik <[EMAIL PROTECTED]> wrote:
>>
>>
>>>- GLib based test programs should never produce a "CRITICAL **:" or
>>>   "WARNING **:" message and succeed.
>>>  
>>>
>>It would be good not to make it impossible to test WARNINGs and
>>CRITICALs. After all, error cases are often the least tested part of
>>an application, so it's important to make sure the base library
>>detects and handles the error cases correctly.
>>
>
>Testing those is like testing segfault handling, i.e. just nuts. The 
>behavior is undefined once they print. (Well, for critical anyway. 
>g_warning seems to be less consistently used)
>  
>
On the other hand - its happened to me several times to have gtk+
crash when setting up properties on an object genericly from glade,
where a simple g_return_if_fail() guard on that public api would have
saved me the trouble of a segfault (sometime one property has no
meaning if another one hasnt been setup yet - in which case a
g_return_if_fail() guard would be appropriate).

Whether or not its really really relevent,  I think that the critical
warnings from a function that wasnt fed the correct arguments,
or are invalid because of the current object state - are part of
the contract of the api, and for whatever thats worth, maybe
worth testing for.

Ummm, while I'm here - I'd also like to say that - (I'm not about to
dig up the quote from the original email but) - I think that there
is some value to reporting all the tests that fail - even after one
of the tests has failed, based on the same principals that you'd
want your compiler to tell you everything that went wrong in its
parse (not just the first error), maybe not all spewed out to the console
by default - but having the whole test results in a log file can
save valuable developer time in some situations.

Cheers,
  -Tristan

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Tim Janik
On Thu, 26 Oct 2006, Tristan Van Berkom wrote:

> Havoc Pennington wrote:
>
>> Michael Urman wrote:
>>
>>
>>> On 10/25/06, Tim Janik <[EMAIL PROTECTED]> wrote:
>>>
>>>
 - GLib based test programs should never produce a "CRITICAL **:" or
   "WARNING **:" message and succeed.


>>> It would be good not to make it impossible to test WARNINGs and
>>> CRITICALs. After all, error cases are often the least tested part of
>>> an application, so it's important to make sure the base library
>>> detects and handles the error cases correctly.
>>>
>>
>> Testing those is like testing segfault handling, i.e. just nuts. The
>> behavior is undefined once they print. (Well, for critical anyway.
>> g_warning seems to be less consistently used)
>>
>>
> On the other hand - its happened to me several times to have gtk+
> crash when setting up properties on an object genericly from glade,

please make sure to file these as bug reports against Gtk+.
gtk shouln't crash if properties are set within valid ranges.

> where a simple g_return_if_fail() guard on that public api would have
> saved me the trouble of a segfault

i reapet, relying on g_return_if_fail() for production code is at beast
futile and at worst dangerous. it might be defined to a NOP.

> (sometime one property has no
> meaning if another one hasnt been setup yet - in which case a
> g_return_if_fail() guard would be appropriate).

wrong, some proeprty values are intentionally set up to support 
free-order settings. e.g. foo=5; foo-set=true; or foo-set=true; foo=5;
every order restriction or property dependency that is introduced in the
proeprty API makes generic handling of properties harder or sometimes
impossible so should be avoided at all costs.
(examples for generic property handling code are language bindings,
property editors, glade, gtk-doc)

> Whether or not its really really relevent,  I think that the critical
> warnings from a function that wasnt fed the correct arguments,
> or are invalid because of the current object state - are part of
> the contract of the api, and for whatever thats worth, maybe
> worth testing for.

no, they are not. functions are simply just defined within the range
specified by the implementor/designer, and in this function for a lot
of glib/gtk functions, i'm informing you that occasional return_if_fail
statements that you may or not may happen to find in glib/gtk code base
are a pure convenience tool to catch programming mistakes easier.

--- that being said, please read on ;)

the above is intentionally worded conservatively, e.g. to allow removal
of return_if_fail statements in the future, so value ranges a function
operates on can be extended *compatibly*. in practice however, we're
doing our best to add missing warning/checks that help programmers
and users to code and use our libraries in the most correct and
reliable way. but by definition, a library/program is broken once a
warninig/critical/assertion/error has been triggered and there is no
point in verifying that these are in fact triggred. i.e. you'd
essentially try to verify that you can make the library/program enter
an undefined state by a certain sequence of usage patterns, and that
is rather pointless.

> Ummm, while I'm here - I'd also like to say that - (I'm not about to
> dig up the quote from the original email but) - I think that there
> is some value to reporting all the tests that fail - even after one
> of the tests has failed, based on the same principals that you'd
> want your compiler to tell you everything that went wrong in its
> parse (not just the first error),

well, try to get your compiler tell you about all compilation errors
in a project covering multiple files or directories if you broke a
multitude of files and in different directories/libraries. it simply
won't work for the vast majority of cases, and also isn't useful in
practice. similar to unit tests, most times a compiler yields errors,
you have to fix the first one before being able to move on.

> maybe not all spewed out to the console
> by default - but having the whole test results in a log file can
> save valuable developer time in some situations.

i don't quite see the scenrio here, so it might be good if you cared
to elaborate this. as described in my initial email, if anything breaks,
it either shouldn't be under test or should be fixed, and i provided
reasonings for why that is the case. if you really mean to challenge
this approach, please also make an effort to adress the reasoning
i provided.

> Cheers,
>  -Tristan

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Iago Toral Quiroga
El mié, 25-10-2006 a las 17:52 +0200, Tim Janik escribió:

> - Unit tests should run fast - a test taking 1/10th of a second is a slow
>unit test, i've mentioned this in my blog entry already.

Sure, very important, or otherwise developers will tend to neither use
nor maintain the tests.

> - in the common case, test results should be reduced to a single boolean:
>  "all tests passed" vs. "at least one test failed"
>many test frameworks provide means to count and report failing tests
>(even automake's standard check:-rule), there's little to no merit to
>this functionality though.
>having/letting more than one test fail and to continue work in an
>unrelated area rapidly leads to confusion about which tests are
>supposed to work and which aren't, especially in multi-contributor setups.
>figuring whether the right test passed, suddenly requires scanning of
>the test logs and remembering the last count of tests that may validly
>fail. this defeats the purpose using a single quick make check run to
>be confident that one's changes didn't introduce breakage.
>as a result, the whole test harness should always either succeed or
>be immediately fixed.

I understand your point, however I still think that being able to get a
wider report with all the tests failing at a given moment is also
interesting (for example in a buildbot continuous integration loop, like
the one being prepared by the build-brigade). Besides, if there is a
group of people that want to work on fixing bugs at the same time, they
would need to get a list of tests failing, not only the first one.

> - for reasons also mentioned in the afformentioned blog entry it might
>be a good idea for Gtk+ as well to split up tests into things that
>can quickly be checked, thoroughly be checked but take long, and into
>performance/benchmark tests.
>these can be executed by make targets check, slowcheck and perf
>respectively.

Yes, seems an excellent idea to me.

> - homogeneous or consistent test output might be desirable in some contexts.

Yes, it is an important point when thinking about a continuous
integration tool for Gnome. If tests for all modules in Gnome agree on a
common output format, then that data can be collected, processed and
presented by a continuous integration tool like buildbot and would make
it easy to do cool things with those test results. In the build-brigade
we had also talked a bit on this subject.

>so far, i've made the experience that for simple make check runs, the most
>important things are that it's fast enough for people to run frequently
>and that it succeeds.
>if somewhat slowly perceived parts are hard to avoid, a progress indicator
>can help a lot to overcome the required waiting time. so, here the exact
>oputput isn't too important as long as some progress is displayed.

Yes, good point. 

> - GLib based test programs should never produce a "CRITICAL **:" or
>"WARNING **:" message and succeed. the reasoning here is that CRITICALs
>and WARNINGs are indicators for an invalid program or library state,
>anything can follow from this.
>since tests are in place to verify correct implementation/operation, an
>invalid program state should never be reached. as a consequence, all tests
>should upon initialization make CRITICALs and WARNINGs fatal (as if
>--g-fatal-warnings was given).

Maybe you would like to test how the library handles invalid input. For
example, let's say we have a function that accepts a pointer as
parameter, I think it is worth knowing if that function handles safely
the case when that pointer is NULL (if that is a not allowed value for
that parameter) or if it produces a segmentation fault in that case. 

> as far as a "testing framework" is needed for GLib/Gtk+, i think it would
> be sufficient to have a pair of common testutils.[hc] files that provide:
> 
> 1- an initialization function that calls gtk_init() and preparses
> arguments relevant for test programs. this should also make all WARNINGs
> and CRITICALs fatal.
> 
> 2- a function to register all widget types provided by Gtk+, (useful for
> automated testing).
> 
> 3- a function to fork off a test and assert it fails in the expected place
> (around a certain statement).
> 
> 4- it may be helpful to have a fork-off and timeout helper function as well.
> 
> 5- simple helper macros to indicate test start/progress/assertions/end.
> (we've at least found these useful to have in Beast.)
> 
> 6- output formatting functions to consistently present performance 
> measurements
> in a machine parsable manner.
> 
> 
> if i'm not mistaken, test frameworks like Check would only help us out with
> 3, 4 and to some extend 5. i don't think this warrants a new package
> dependency, especially since 5 might be highly customized and 3 or 4 could be
> useful to provide generally in GLib.
> 

I'll add here some points suppo

Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Tristan Van Berkom
Tim Janik wrote:

> On Thu, 26 Oct 2006, Tristan Van Berkom wrote:
>
>> Havoc Pennington wrote:
>>
>>> Michael Urman wrote:
>>>
>>>
 On 10/25/06, Tim Janik <[EMAIL PROTECTED]> wrote:


> - GLib based test programs should never produce a "CRITICAL **:" or
>   "WARNING **:" message and succeed.
>
>
 It would be good not to make it impossible to test WARNINGs and
 CRITICALs. After all, error cases are often the least tested part of
 an application, so it's important to make sure the base library
 detects and handles the error cases correctly.

>>>
>>> Testing those is like testing segfault handling, i.e. just nuts. The
>>> behavior is undefined once they print. (Well, for critical anyway.
>>> g_warning seems to be less consistently used)
>>>
>>>
>> On the other hand - its happened to me several times to have gtk+
>> crash when setting up properties on an object genericly from glade,
>
>
> please make sure to file these as bug reports against Gtk+.
> gtk shouln't crash if properties are set within valid ranges.

Yes siree, usually if not always - I do.

>> where a simple g_return_if_fail() guard on that public api would have
>> saved me the trouble of a segfault
>
>
> i reapet, relying on g_return_if_fail() for production code is at beast
> futile and at worst dangerous. it might be defined to a NOP.
>
Yes, my point here is only that having a warning for misusing the api,
particularly in /development/ code - should be part of the contract of
the api - maybe it doesnt make sence to test for it... fuzzy.

>> (sometime one property has no
>> meaning if another one hasnt been setup yet - in which case a
>> g_return_if_fail() guard would be appropriate).
>
>
> wrong, some proeprty values are intentionally set up to support 
> free-order settings. e.g. foo=5; foo-set=true; or foo-set=true; foo=5;
> every order restriction or property dependency that is introduced in the
> proeprty API makes generic handling of properties harder or sometimes
> impossible so should be avoided at all costs.
> (examples for generic property handling code are language bindings,
> property editors, glade, gtk-doc)

I think thats a little uncooperative - wouldnt you say on the other hand 
- that
the independance of property ordering - aside from possible real weird 
corner
cases - should be promoted as much as possible to make life easier for 
language
binding writers, property editors and such ?

It is ofcourse a two-way street in the sence that one should not be 
mislead and
not program defensively in this aspect.

>> Whether or not its really really relevent,  I think that the critical
>> warnings from a function that wasnt fed the correct arguments,
>> or are invalid because of the current object state - are part of
>> the contract of the api, and for whatever thats worth, maybe
>> worth testing for.
>
>
> no, they are not. functions are simply just defined within the range
> specified by the implementor/designer, and in this function for a lot
> of glib/gtk functions, i'm informing you that occasional return_if_fail
> statements that you may or not may happen to find in glib/gtk code base
> are a pure convenience tool to catch programming mistakes easier.
>
> --- that being said, please read on ;)
>
> the above is intentionally worded conservatively, e.g. to allow removal
> of return_if_fail statements in the future, so value ranges a function
> operates on can be extended *compatibly*. in practice however, we're
> doing our best to add missing warning/checks that help programmers
> and users to code and use our libraries in the most correct and
> reliable way. but by definition, a library/program is broken once a
> warninig/critical/assertion/error has been triggered and there is no
> point in verifying that these are in fact triggred. i.e. you'd
> essentially try to verify that you can make the library/program enter
> an undefined state by a certain sequence of usage patterns, and that
> is rather pointless.

Yes I see, saying whether it is part of the "contract" may sound binding,
but an effort to report api misuse is definitly in need. Also adding
a g_return_if_fail() to your function should probably not constitute an
addition to your test suite (i.e. you'd usually progressively add tests
to ensure that your api complains properly when misused - well,
I guess its a lot of effort for a low yield).

>> Ummm, while I'm here - I'd also like to say that - (I'm not about to
>> dig up the quote from the original email but) - I think that there
>> is some value to reporting all the tests that fail - even after one
>> of the tests has failed, based on the same principals that you'd
>> want your compiler to tell you everything that went wrong in its
>> parse (not just the first error),
>
>
> well, try to get your compiler tell you about all compilation errors
> in a project covering multiple files or directories if you broke a
> multitude of files and in different directories/libraries. it sim

Re: Gtk+ unit tests (brainstorming)

2006-10-26 Thread Bill Haneman
Hi:
>
>   
>> - homogeneous or consistent test output might be desirable in some contexts.
>> 
>
> Yes, it is an important point when thinking about a continuous
> integration tool for Gnome. If tests for all modules in Gnome agree on a
> common output format, then that data can be collected, processed and
> presented by a continuous integration tool like buildbot and would make
> it easy to do cool things with those test results. In the build-brigade
> we had also talked a bit on this subject.
>   
While I agree that for gtk+, a lightweight unit test framework with 
limited dependencies makes sense, for gnome as a whole, at the Boston 
summit we proposed a test tinderbox system for Gnome based on 
dogtail/LDTP or a similar system, using AT-SPI as the framework. 

This would not only be able to test (and integration test) individual 
components and the Gnome stack, it would also allow automated testing of 
accessibility support and allow early detection of regressions in these 
areas (which have been major problems throughout the Gnome 2.X series).

See this thread: 
http://mail.gnome.org/archives/dogtail-devel-list/2006-October/msg00011.html
regarding the creation of [EMAIL PROTECTED]

regards

Bill


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-30 Thread Tim Janik
On Wed, 25 Oct 2006, Havoc Pennington wrote:

> Hi,
>
> When coding dbus I thought I'd try a project with a focus on unit tests.
> It has (or at least had for a while) exceptionally high test coverage,
> around 75% of basic blocks executed in make check. The coverage-analyzer
> has been busted for a couple years though.
>
> Here are my thoughts from dbus:
>
>  - the "make coverage-report" was by far the biggest win I spent time
>on.

ah, interesting. could you please explain why you consider it
such a big win?

>  - frequently I needed to add extra interfaces or levels of
>abstraction to be able to test effectively. For example,
>allowing "dummy" implementations of dependency
>module to be inserted underneath a module I was testing.
>
>dbus is heavily conditionalized on a DBUS_BUILD_TESTS
>parameter, which allows adding all kinds of test-only code
>without fear of bloating the production version. One price
>of this is that the tested lib is slightly different from the
>production lib.

ah, good to know. though i'd consider that price considerably high for a
project of the size and build time of Gtk+, and where we'd really benefit
from having *many* developers and contributors run make check.
especially, when you have a quite large legacy code base, instead of
developing with conditionalized test hooks from the start.

>  - based on how nautilus does unit tests, I put the tests in the file
>with the code being tested. The rationale is similar to the
>rationale for inline documentation. I think it's a good approach,
>but it does require a distinct "test build" (DBUS_BUILD_TESTS).

sounds interesting as well. the downsize is of course the assorted
file growth, and gtk+ already isn't a particularly good citizen in
terms of loc per file ;)

$ wc -l *.c | sort -r | head
   380899 total
14841 gtktreeview.c
11360 gtkaliasdef.c
 9154 gtkiconview.c
 8764 gtkfilechooserdefault.c
 8632 gtktextview.c
 8060 gtkwidget.c

>Another advantage of this is that internal code can be tested, while
>it may not be possible to fully exercise internal code using the
>public API.

thanks for your insight havoc. i'll definitely look into the coverage
report generation at some later point.

> Havoc

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-30 Thread Dimi Paun

On Mon, October 30, 2006 9:34 am, Tim Janik wrote:
>>  - based on how nautilus does unit tests, I put the tests in the file
>>with the code being tested. The rationale is similar to the
>>rationale for inline documentation. I think it's a good approach,
>>but it does require a distinct "test build" (DBUS_BUILD_TESTS).
>
> sounds interesting as well. the downsize is of course the assorted
> file growth, and gtk+ already isn't a particularly good citizen in
> terms of loc per file ;)

The problem with documentation is that it is only maintained by humans,
so the only way to have a hope of consistency is to have it close to the
code that it is documenting.

Tests do not benefit that much, since any consistency problem will simply
result in a very evident test failure. As long as you never allow tests
to fail, you simply do not have the problem that inline documentation is
trying to solve.

Moreover, tests tend to be long winded and ugly looking, and I find the
additional cruft they add to be rather distracting.

I submit that due to their nature, they work out better in separate files.

-- 
Dimi Paun <[EMAIL PROTECTED]>
Lattica, Inc.


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-30 Thread Havoc Pennington
Tim Janik wrote:
> 
> ah, interesting. could you please explain why you consider it
> such a big win?
> 

Without it I think I usually write about 10% coverage, and imagine in my 
mind that it is 50% or so ;-) I'm guessing this is pretty common.

With it, it was easy to just browse and say "OK, this part isn't tested 
yet, this part is tested too much so we can speed up the tests," etc.

Also, if someone submits a patch with tests, you can see if their tests 
are even exercising their code.

It just gives you a way to know how well you're doing and see what else 
needs doing.

The special gcov replacement that bitrotted in dbus did a couple things:
  - merged results from multiple executables into one report
  - omitted the test harness itself from the report, i.e. without
my special hacks if you have:
 if (test_failed())
   assert_not_reached();
then gcov would count assert_not_reached() as an uncovered block
in the stats.

Maybe some other changes on top of gcov too, I don't remember.

The report had coverage % for the whole project, each directory, and 
each file in one report. And then it spit out the annotated source for 
each file.

Anyone can cobble together this kind of info with enough work, part of 
the point is that --enable-coverage in configure, plus "make 
coverage-report" makes it so easy to run the report that it happens much 
more often.

Last time I was looking at fixing it I started hacking on a valgrind 
tool as a replacement for trying to keep up with the ever-changing gcov 
file format, but I didn't really work on that beyond posting a 
half-baked initial patch to the valgrind list.

Havoc

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-30 Thread mathieu lacage
On Mon, 2006-10-30 at 15:34 +0100, Tim Janik wrote:
> On Wed, 25 Oct 2006, Havoc Pennington wrote:
> 
> > Hi,
> >
> > When coding dbus I thought I'd try a project with a focus on unit tests.
> > It has (or at least had for a while) exceptionally high test coverage,
> > around 75% of basic blocks executed in make check. The coverage-analyzer
> > has been busted for a couple years though.
> >
> > Here are my thoughts from dbus:
> >
> >  - the "make coverage-report" was by far the biggest win I spent time
> >on.
> 
> ah, interesting. could you please explain why you consider it
> such a big win?

I use a similar setting in a project of mine and the big win is, for me,
the ability of knowing what needs to be done: it is easy to spot the
locations which are never tested and thus likely to be buggy.

> 
> >  - frequently I needed to add extra interfaces or levels of
> >abstraction to be able to test effectively. For example,
> >allowing "dummy" implementations of dependency
> >module to be inserted underneath a module I was testing.
> >
> >dbus is heavily conditionalized on a DBUS_BUILD_TESTS
> >parameter, which allows adding all kinds of test-only code
> >without fear of bloating the production version. One price
> >of this is that the tested lib is slightly different from the
> >production lib.
> 
> ah, good to know. though i'd consider that price considerably high for a
> project of the size and build time of Gtk+, and where we'd really benefit
> from having *many* developers and contributors run make check.
> especially, when you have a quite large legacy code base, instead of
> developing with conditionalized test hooks from the start.

Usually, these test hooks are always ON for developer builds and only
OFF for release builds. 

> >  - based on how nautilus does unit tests, I put the tests in the file
> >with the code being tested. The rationale is similar to the
> >rationale for inline documentation. I think it's a good approach,
> >but it does require a distinct "test build" (DBUS_BUILD_TESTS).
> 
> sounds interesting as well. the downsize is of course the assorted
> file growth, and gtk+ already isn't a particularly good citizen in
> terms of loc per file ;)
> 
> $ wc -l *.c | sort -r | head
>380899 total
> 14841 gtktreeview.c
> 11360 gtkaliasdef.c
>  9154 gtkiconview.c
>  8764 gtkfilechooserdefault.c
>  8632 gtktextview.c
>  8060 gtkwidget.c
> 
> >Another advantage of this is that internal code can be tested, while
> >it may not be possible to fully exercise internal code using the
> >public API.
> 
> thanks for your insight havoc. i'll definitely look into the coverage
> report generation at some later point.

I think havoc used his own report generation tool. You might want to
give lcov a try: it generates some nice-looking html.

Mathieu

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-31 Thread Iago Toral Quiroga
Havoc Pennington wrote:
> Tim Janik wrote:
> > 
> > ah, interesting. could you please explain why you consider it
> > such a big win?
> > 
> 
> Without it I think I usually write about 10% coverage, and imagine in my 
> mind that it is 50% or so ;-) I'm guessing this is pretty common.
> 
> With it, it was easy to just browse and say "OK, this part isn't tested 
> yet, this part is tested too much so we can speed up the tests," etc.
> 
> Also, if someone submits a patch with tests, you can see if their tests 
> are even exercising their code.
> 
> It just gives you a way to know how well you're doing and see what else 
> needs doing.

Sure! Tim, you can take a look here to see this in practice:
http://gtktests-buildbot.igalia.com/gnomeslave/gtk+/lcov/gtk/index.html

Those are the code coverage results for the tests I developed. As you
browse the files you realize the code that is tested (blue) and the code
that is not (red).  I think this helps with:

   * Realizing which code your tests are actually covering.
   * Designing new tests so they are not redundant.
   * Analyze which execution branches are not tested for a given
interface.
   * Easily check which files have more tests and which ones need more
testing work based on coverage %.

Iago.
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-31 Thread Tim Janik
On Wed, 25 Oct 2006, Federico Mena Quintero wrote:

> On Wed, 2006-10-25 at 17:52 +0200, Tim Janik wrote:

>> - GLib based test programs should never produce a "CRITICAL **:" or
>>"WARNING **:" message and succeed.
>
> Definitely.  I have some simple code in autotestfilechooser.c to catch
> this and fail the tests if we get warnings/criticals.

oh? hm, looking at the code, it doesn't seem to be significantly superior 
to --g-fatal-warnings:
   fatal_mask = g_log_set_always_fatal (G_LOG_FATAL_MASK);
   fatal_mask |= G_LOG_LEVEL_WARNING | G_LOG_LEVEL_CRITICAL;
   g_log_set_always_fatal (fatal_mask);
or am i missing something?
note that the error counting in your code doesn't funciton beyond 1, because
g_log always treats errors as FATAL, that behaviour is not configurable.

>> 5- simple helper macros to indicate test start/progress/assertions/end.
>> (we've at least found these useful to have in Beast.)
>
> For the file chooser tests, I just return a boolean from the test
> functions, which obviously indicates if the test passed or not.  What
> sort of macros are you envisioning?

just a very few and simple ones that are used to beautify progress 
indication, e.g.:

TSTART(printfargs)  - print info about the test being run
TOK()   - print progress hyphen unconditionally: '-'
TASSERT(cond)   - print progress hyphen if cond==TRUE, abort otherwise
TCHECK(cond)- like TASSERT() but skip hyphen printing
TDONE() - finish test info / progress bar

for simple tests, i just use TSTART/TASSERT/TDONE, but for tests that explore
a wide range of numbers in a loop, a combination of TCHECK/TOK works better
to avoid excess progress indication. e.g.:

   TSTART ("unichar isalnum");
   TASSERT (g_unichar_isalnum ('A') == TRUE);   // prints hypen: '-'
   TASSERT (g_unichar_isalnum ('?') == FALSE);  // prints hypen: '-'
   for (uint i = 0; i < 100; i++)
 {
   gunichar uc = rand() % (0x100 << (i % 24));
   if (i % 2 == 0)
 TOK(); // prints hypen: '-'
   gboolean bb = Unichar::isalnum (uc);
   gboolean gb = g_unichar_isalnum (uc);
   TCHECK (bb == gb);   // silent assertion
 }
   TDONE();

produces:

unichar isalnum: []

if any of the checks fail, all of __FILE__, __LINE__, __PRETTY_FUNCTION__
and cond are printed of course, e.g.:
** ERROR **: strings.cc:270:int main(int, char**)(): assertion failed: 
g_unichar_isalnum ('?') == true

having java-style TASSERT_EQUAL(val1,val2) could probably also be nice,
in order to print the mismatching values with the error message.


>> - try setting & getting all widget properties on all widgets over the full
>>value ranges (sparsely covered by means of random numbers for instance)
>> - try setting & getting all container child properties analogously
>
> I believe the OSDL tests do this already.  We should steal that code;
> there's a *lot* of properties to be tested!

not sure what code you're referring to. but what i meant here is code that
generically queries all properties and explores their value range. this
kind of generic querying code is available in a lot of places already (beast,
gle, gtk prop editor, libglade, gtk-doc, LBs, etc.)


>> - for all widget types, create and destroy them in a loop to:
>>a) measure basic object setup performance
>>b) catch obvious leaks
>>(these would be slowcheck/perf tests)
>
> Yeah.  GtkWidgetProfiler (in gtk+/perf) will help with this.  Manu
> Cornet has improved it a lot for his theme engine torturer, so we should
> use *that* version :)

hm, i really have not much of an idea about what it does beyond exposing
the main window in a loop ;) 
gtk+/perf/README looks interesting though, are there any docs on this
beyond that file?

>  Federico

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-31 Thread Tim Janik
On Wed, 25 Oct 2006, Carl Worth wrote:

> On Wed, 25 Oct 2006 12:40:27 -0500, Federico Mena Quintero wrote:

> There are some things I really don't like in cairo's "make check"
> suite right now:

> 2. The tests take forever to link. Each test right now is a separate
>   program. I chose this originally so that I could easily execute
>   individual tests, (something I still do regularly and still
>   require). The problem is that with any change to the library, "make
>   check" goes through the horrifically slow process of using libtool
>   to re-link the hundred or so programs. One idea that's been floated
>   to fix this is something like a single test program that links with
>   the library, and then dlopens each test module (or something like
>   this). Nothing like that has been implemented yet.

that won't quite work either, because libtool has to link shared modules
also. and that takes even longer. for beast, that's the sole issue, the
plugins/ dir takes forever to build and forever to install (libtool relinks
upon installation).

to avoid this type of hassle with the test programs, what we've been
doing is basically to put multiple tests into a single programs, i.e.

static void
test_paths (void)
{
   TSTART ("Path handling");
   TASSERT (...);
   TDONE();
}
[...]

int
main (int   argc,
   char *argv[])
{
   birnet_init_test (&argc, &argv);

   test_cpu_info();
   test_paths();
   test_zintern();
   [...]
   test_virtual_typeid();

   return 0;
}

> Something that is worth stealing is some of the work I've been doing
> in "make perf" in cairo. I've been putting a lot of effort into
> getting the most reliable numbers out of running performance tests,
> (and doing it is quickly as possible yet). I started with stuff that
> was in Manu's torturer with revamping from Benjamin Otte and I've
> further improved it from there.
>
> Some of the useful stuff is things such as using CPU performance
> counters for measuring "time", (which of course I didn't write, but
> just got from liboil---thanks David!), and then some basic statistical
> analysis---such as reporting the average and standard deviation over
> many short runs timed individually, rather than just timing many runs
> as a whole, (which gives the same information as the average, but
> without any indication of how stable the results are from one to the
> next).

we've looked at cairo's perf output the other day, and one thing we really
failed to understand is that you print average values for your test runs.
granted, there might be some limited use to averaging over multiple runs
to have an idea how much time "could" be consumed by a particular task,
but much more interesting are other numbers.

i.e. using averaging, your numbers include uninteresting outliers
that can result from scheduling artefacts (like measuring a whole second
for copying a single pixel), and they hide the interesting information,
which is the fastest possible performance encountered for your test code.

printing the median over your benchmark runs would give a much better
indication of the to-be-expected average runtime, because outliers
into either direction are essentially ignored that way.

most interesting for benchmarking and optimization however is the minimum
time a specific operation takes, since in machine execution there is a hard
lower limit we're interested in optimizing. and apart from performance
clock skews, there'll never be minimum time measurement anomalies wich
we wanted to ignore.

for beast, we've used a combination of calibration code to figure minimum
test run repetitions and taking measurements minimums, which yields quite
stable and accurate results even in the presence of concurrent background
tasks like project/documentation build processes.


> The statistical stuff could still be improved, (as I described in a
> recent post to performance-list), but I think it is a reasonable
> starting point.

well, apologies if median/minimum printing is simply still on your TODO ;)


> Oh, and my code also takes care to do things like ensuring that the X
> server has actually finished drawing what you asked it to, (I think
> GtkWidgetProfiler does that as well---but perhaps with a different
> approach). My stuff uses a single-pixel XGetImage just before starting
> or stopping the timer.

why exactly is that a good idea (and better than say XSync())?
does the X server implement logic like globally carrying out all
pending/enqueued drawing commands before allowing any image capturing?



> Never forget the truth that Keith Packard likes to share
> often:
>
>   Untested code == Broken code

heh ;)

> -Carl
>

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-31 Thread Carl Worth
[CC'ing Keith for a question near the end...]

On Tue, 31 Oct 2006 15:26:35 +0100 (CET), Tim Janik wrote:
> i.e. using averaging, your numbers include uninteresting outliers
> that can result from scheduling artefacts (like measuring a whole second
> for copying a single pixel), and they hide the interesting information,
> which is the fastest possible performance encountered for your test code.

If computing an average, it's obviously very important to eliminate
the slow outliers, because they will otherwise skew it radically. What
cairo-perf is currently doing for outliers is really cheesy,
(ignoring a fixed percentage of the slowest results). One thing I
started on was to do adaptive identification of outliers based on the
"> Q3 + 1.5 * IQR" rule as discussed here:

http://en.wikipedia.org/wiki/Outlier

I didn't push that work out yet, since I got busy with other
things. In that work, I was also reporting the median instead of the
average. Surprisingly, these two changes didn't help the stability as
much as I would have hoped. But the idea of using minimum times as
suggested below sounds appealing.

By the way, the reason I started with reporting averages is probably
just because Imeausure to report some measure of the statistical
dispersion, and the measure I was most familiar with (the standard
deviation) is defined in terms of the arithmetic mean. But, to base
things on the median instead, I could simply report the "average
absolute deviation" from the median rather than the standard
deviation.

> most interesting for benchmarking and optimization however is the minimum
> time a specific operation takes, since in machine execution there is a hard
> lower limit we're interested in optimizing. and apart from performance
> clock skews, there'll never be minimum time measurement anomalies wich
> we wanted to ignore.

This is a good point and something I should add to cairo-perf. Changes
to the minimum really should indicate interesting changes so that
should form a very stable basis for cairo-perf-diff to make its
decisions.

> >My stuff uses a single-pixel XGetImage just before starting
> > or stopping the timer.
>
> why exactly is that a good idea (and better than say XSync())?
> does the X server implement logic like globally carrying out all
> pending/enqueued drawing commands before allowing any image capturing?

My most certain answer is "That's what keithp told me to use".

Less certainly, I am under the impression that yes, the X server will
never provide a pixel result back that could be modified by
outstanding requests previously sent to the X server. I believe that
this is a protocol requirement, and as a result the XGetImage trick is
the only way to ensure that all drawing operations have completed.

Keith can you confirm or deny the rationale for this approach? Is it
really "better" than XSync?

-Carl


pgpjfBji59YC5.pgp
Description: PGP signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-10-31 Thread Stefan Kost
Hi Tim,

Tim Janik wrote:
> Hi all.
>
> as mentioned in another email already, i've recently worked on improving
> unit test integration in Beast and summarized this in my last blog entry:
>http://blogs.gnome.org/view/timj/2006/10/23/0 # Beast and unit testing
>
>   
I did a presentation on check base unit tests on last guadec. Here are
the slides:
http://www.buzztard.org/files/guadec2005_advanced_unit_testing.pdf

I have made good experience with check in gstreamer and also in
buzztard. IMHO it does not make sense to write an own test suite. The
unit tests are optional and IHMO its not too mauch to ask for developers
to install check - maybe print a teaser for people that build CVS code
and don't have check installed. And btw. most distros have it.
> while analysing the need for a testing framework and whether it makes sense
> for GLib and Gtk+ to depend on yet another package for the sole purpose of
> testing, i made/had the following observations/thoughts:
>
> - Unit tests should run fast - a test taking 1/10th of a second is a slow
>unit test, i've mentioned this in my blog entry already.
>   
In the slides we were talking about the concept of test aspects. You do
positive and negative tests, performance and stress tests. It makes
sense to organize the testsuite to reflect this. It might also make
sense to have one test binary per widget (class). This way you can
easily run single tests. IHMO its not big deal if tests run slow. If you
have a good test coverage, the whole test run will be slow anyway. For
that purpose we have continuous integrations tools like buildbot. That
will happily run your whole testsuite even under valgrind and bug the
developer per IRC/mail/whatever.
> - the important aspect about a unit test is the testing it does, not the
>testing framework matter. as such, a testing framework doesn't need to
>be big, here is one that is implemented in a whole 4 lines of C source,
>it gets this point across very well: ;)
>  http://www.jera.com/techinfo/jtns/jtn002.html
>   
True, but reinventing the wheel is usually just means repeating errors.
> - in the common case, test results should be reduced to a single boolean:
>  "all tests passed" vs. "at least one test failed"
>many test frameworks provide means to count and report failing tests
>(even automake's standard check:-rule), there's little to no merit to
>this functionality though.
>having/letting more than one test fail and to continue work in an
>unrelated area rapidly leads to confusion about which tests are
>supposed to work and which aren't, especially in multi-contributor setups.
>figuring whether the right test passed, suddenly requires scanning of
>the test logs and remembering the last count of tests that may validly
>fail. this defeats the purpose using a single quick make check run to
>be confident that one's changes didn't introduce breakage.
>as a result, the whole test harness should always either succeed or
>be immediately fixed.
>   
Totally disagree. The whole point in using the fork-based approach
together with setup/teardown hooks is to provide a sane test environment
for each case. When you run the test suite on a build bot, you want to
know about the overall state (percentage of pass/fail) plus a list of
tests that fail, so that you can fix the issues the test uncovered.
GSteamer and many apps based on it use a nice logging frameworks (that
also previously has been offered for glib integration (glog)). The tests
create logs that help to understand the problem.
> - for reasons also mentioned in the afformentioned blog entry it might
>be a good idea for Gtk+ as well to split up tests into things that
>can quickly be checked, thoroughly be checked but take long, and into
>performance/benchmark tests.
>these can be executed by make targets check, slowcheck and perf
>respectively.
>
> - for tests that check abort()-like behvaior, it can make sense to fork-off
>a test program and check whether it fails in the correct place.
>allthough this type of checks are the minority, the basic
>fork-functionality shouldn't be reimplemented all over again and warrants
>a test utility function.
>   
This is available in 'check'.
> - for time bound tasks it can also make sense to fork a test and after
>a certain timeout, abort and fail the test.
>   
This is available in 'check'.
> - some test suites offer formal setup mechnisms for test "sessions".
>i fail to see the necessity for this. main() { } provides useful test
>grouping just as well, this idea is applied in an example below.
>   
See above.
> - multiple tests may need to support the same set of command line arguments
>e.g. --test-slow or --test-perf as outlined in the blog entry.
>it makes sense to combine this logic in a common test utility function,
>usually pretty small.
>   
Agree here. The tests should forward their argc/argv (except when t

Re: Gtk+ unit tests (brainstorming)

2006-11-10 Thread Carl Worth
On Tue, 31 Oct 2006 10:26:41 -0800, Carl Worth wrote:
> On Tue, 31 Oct 2006 15:26:35 +0100 (CET), Tim Janik wrote:
> > i.e. using averaging, your numbers include uninteresting outliers
> > that can result from scheduling artefacts (like measuring a whole second
> > for copying a single pixel), and they hide the interesting information,
> > which is the fastest possible performance encountered for your test code.
>
> If computing an average, it's obviously very important to eliminate
> the slow outliers, because they will otherwise skew it radically. What
> cairo-perf is currently doing for outliers is really cheesy,
> (ignoring a fixed percentage of the slowest results). One thing I
> started on was to do adaptive identification of outliers based on the
> "> Q3 + 1.5 * IQR" rule as discussed here:
>
>   http://en.wikipedia.org/wiki/Outlier

For reference (or curiosity), in cairo's performance suite, I've now
changed the cairo-perf program, (which does "show me the performance
for the current cairo revision"), to report minimum (and median) times
and it does do the adaptive outlier detection mentioned above.

But when I take two of these reports generated separately and compare
them, I'm still seeing more noise than I'd like to see, (things like a
40% change when I _know_ that nothing in that area has changed).

I think one problem that is happening here is that even though we're
doing many iterations for any given test, we're doing them all right
together so some system-wide condition might affect all of them and
get captured in the summary.

So I've now taken a new approach which is working much better. What
I'm doing now for cairo-perf-diff which does "show me the performance
difference between two different revisions of cairo" is to save the
raw timing for every iteration of every test. Then, statistics are
generated only just before the comparison. This makes it easy to go
back and append additional data if some of the results look off. This
has several advantages:

 * I can append more data only for tests where the results look bad,
   so that's much faster.

 * I can run fewer iterations in the first place, since I'll be
   appending more later as needed. This makes the whole process much
   faster.

 * Appending data later means that I'm temporally separating runs for
   the same test and library version, so I'm more immune to random
   system-wide disturbances.

 * Also, when re-running the suite with only a small subset of the
   tests, the two versions of the library are compared at very close
   to the same time, so system-wide changes are less likely to make a
   difference in the result.

I'm really happy with the net result now. I don't even bother worrying
about not using my laptop while the performance suite is running
anymore, since it's quick and easy to correct problems later. And when
I see the results, if some of the results looks funny, I re-run just
those tests, and sure enough the goofy stuff just disappears,
(validating my assumption that it was bogus), or it sticks around no
matter how many times I re-run it, (leading me to investigate and
learn about some unexpected performance impact).

And it caches all of those timing samples so it doesn't have to
rebuild or re-run the suite to compare against something it has seen
before, (the fact that git has hashes just sitting there for the
content of every directory made this easy and totally free). The
interface looks like this:

# What's the performance impact of the latest commit?
cairo-perf-diff HEAD

# How has performance changed from 1.2.0 to 1.2.6? from 1.2.6 to now?
cairo-perf-diff 1.2.0 1.2.6
cairo-perf-diff 1.2.6 HEAD

# As above, but force a re-run even though there's cached data:
cairo-perf-diff -f 1.2.6 HEAD

# As above, but only re-run the named tests:
cairo-perf-diff -f 1.2.6 HEAD -- stroke fill

The same ideas could be implemented with any library performance
suite, and with pretty much any revision control system. It is handy
that git makes it easy to easily name ranges of commits. So, if I
wanted a commit-by-commit report of every change that is unique to
some branch, (let's say, whats on HEAD since 1.2 split off), I could
do something like this:

for rev in $(git rev-list 1.2.6..HEAD); do
cairo-perf-diff rev
done

-Carl

PS. Yes, it is embarrassing that no matter what the topic I end up
plugging git eventually.


pgpLSYQbeXKNI.pgp
Description: PGP signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-11-14 Thread Tim Janik
On Thu, 26 Oct 2006, Iago Toral Quiroga wrote:

>> - in the common case, test results should be reduced to a single boolean:
>>  "all tests passed" vs. "at least one test failed"
>>many test frameworks provide means to count and report failing tests
>>(even automake's standard check:-rule), there's little to no merit to
>>this functionality though.
>>having/letting more than one test fail and to continue work in an
>>unrelated area rapidly leads to confusion about which tests are
>>supposed to work and which aren't, especially in multi-contributor setups.
>>figuring whether the right test passed, suddenly requires scanning of
>>the test logs and remembering the last count of tests that may validly
>>fail. this defeats the purpose using a single quick make check run to
>>be confident that one's changes didn't introduce breakage.
>>as a result, the whole test harness should always either succeed or
>>be immediately fixed.
>
> I understand your point, however I still think that being able to get a
> wider report with all the tests failing at a given moment is also
> interesting (for example in a buildbot continuous integration loop, like
> the one being prepared by the build-brigade). Besides, if there is a
> group of people that want to work on fixing bugs at the same time, they
> would need to get a list of tests failing, not only the first one.

well, you can get that to some extend automatically, if you invoke
   $ make -k check
going beyond that would be a bad trade off i think because:
a) tests are primarily in place to ensure certain functionality is
implemented correctly and continues to work;
b) if things break, tests need to be easy to debug. basically, most of the
time a test failes, you have to engage the debugger, read code/docs,
analyse and fix. tests that need forking or are hard to understand
get into the way of this process, so should be avoided.
c) implementation and test code often has dependencies that won't allow
to test beyond the occourance of an error. a simple example is:
  o = object_new();
  ASSERT (o != NULL); /* no point to continue beyond this on error */
  test_function (o);
a more intimidating case is:
  main() {
test_gtk_1();
test_gtk_2();
test_gtk_3();
test_gtk_4();
test_gtk_5();
// ...
  }
if any of those test functions (say test_gtk_3) produces a gtk/glib
error/assertion/warning/critical, the remaining test functions (4, 5, ...)
are likely to fail for bogus reasons because the libraries entered
undefined state.
reports of those subsequent errors (which are likely to be very
misleading) is useless at best and confusing (in terms of what error really
matters) at worst.
yes, forking for each of the test functions works around that (provided
they are as independent of one another as in the example above), but again,
this complicates the test implementation (it's not an easy to understand
test program anymore) and debuggability, i.e. affectes the 2 main
properties of a good test program.

to sum this up, reporting multiple fine grained test failures may have some
benefits, mainly those you outlined. but it comes at a certain cost, i.e.
test code complexity and debugging hinderance which are both important
properties of good test programs.
also, consider that "make -k check" can still get you reports on multiple
test failures, just at a somewhat lower granularity. in fact, it's just low
enough to avoid bogus reports.
so options face-to-face, adding fork mode when you don't have to (i.e. other
than checking g_error implementation) provides questionable benefits at
significant costs.
that's not an optimal trade off for gtk test programs i'd say, and i'd expect
the same to hold for most other projects.

>> - GLib based test programs should never produce a "CRITICAL **:" or
>>"WARNING **:" message and succeed. the reasoning here is that CRITICALs
>>and WARNINGs are indicators for an invalid program or library state,
>>anything can follow from this.
>>since tests are in place to verify correct implementation/operation, an
>>invalid program state should never be reached. as a consequence, all tests
>>should upon initialization make CRITICALs and WARNINGs fatal (as if
>>--g-fatal-warnings was given).
>
> Maybe you would like to test how the library handles invalid input. For
> example, let's say we have a function that accepts a pointer as
> parameter, I think it is worth knowing if that function handles safely
> the case when that pointer is NULL (if that is a not allowed value for
> that parameter) or if it produces a segmentation fault in that case.

no, it really doesn't make sense to test functions outside the defined
value ranges. that's because when implementing, the only thing you need
to actually care about from an API perspective is: the defined value ranges.
b

Re: Gtk+ unit tests (brainstorming)

2006-11-15 Thread Iago Toral Quiroga
El mar, 14-11-2006 a las 15:33 +0100, Tim Janik escribió:
> > I understand your point, however I still think that being able to get a
> > wider report with all the tests failing at a given moment is also
> > interesting (for example in a buildbot continuous integration loop, like
> > the one being prepared by the build-brigade). Besides, if there is a
> > group of people that want to work on fixing bugs at the same time, they
> > would need to get a list of tests failing, not only the first one.
> 
> well, you can get that to some extend automatically, if you invoke
>$ make -k check

Yes, if we split the tests into independent test programs I think that's
a reasonable approach.

> going beyond that would be a bad trade off i think because:

[...]

> c) implementation and test code often has dependencies that won't allow
> to test beyond the occourance of an error. a simple example is:
>   o = object_new();
>   ASSERT (o != NULL); /* no point to continue beyond this on error */
>   test_function (o);
> a more intimidating case is:
>   main() {
> test_gtk_1();
> test_gtk_2();
> test_gtk_3();
> test_gtk_4();
> test_gtk_5();
> // ...
>   }
> if any of those test functions (say test_gtk_3) produces a gtk/glib
> error/assertion/warning/critical, the remaining test functions (4, 5, ...)
> are likely to fail for bogus reasons because the libraries entered
> undefined state.
> reports of those subsequent errors (which are likely to be very
> misleading) is useless at best and confusing (in terms of what error 
> really
> matters) at worst.
> yes, forking for each of the test functions works around that (provided
> they are as independent of one another as in the example above), but 
> again,
> this complicates the test implementation (it's not an easy to understand
> test program anymore) and debuggability, i.e. affectes the 2 main
> properties of a good test program.

Mmm... actually, based on my experience using Check and fork mode, it
does not complicate the test implementation beyond adding a gtk_init()
to each forked test. Forking the tests is done transparently by Check
based on an environment variable. The same applies to debugging,
although it is true that debugging a forked test program is anoying, you
can disable fork mode when debugging just switching the environment
variable:

$ CK_FORK=no gdb mytest

This shouldn't be a problem if we stop test execution after the first
failed test.

> > Maybe you would like to test how the library handles invalid input. For
> > example, let's say we have a function that accepts a pointer as
> > parameter, I think it is worth knowing if that function handles safely
> > the case when that pointer is NULL (if that is a not allowed value for
> > that parameter) or if it produces a segmentation fault in that case.
> 
> no, it really doesn't make sense to test functions outside the defined
> value ranges. that's because when implementing, the only thing you need
> to actually care about from an API perspective is: the defined value ranges.
> besides that, value rtanges may compatibly be *extended* in future versions,
> which would make value range restriction tests break unecessarily.
> if a funciton is not defined for say (char*)0, adding a test that asserts
> certain behaviour for (char*)0 is effectively *extending* the current
> value range to include (char*)0 and then testing the proper implementation
> of this extended case. the outcome of which would be a CRITICAL or a segfault
> though, and with the very exception of g_critical(), *no* glib/gtk function
> implements this behaviour purposefully, compatibly, or in a documented way.
> so such a test would at best be bogus and uneccessary.

I think the main difference here between your point of view and mine is
that I'm seeing the API from the user side, while you see it from the
developer side. I explain myself:

>From a developer point of view, it is ok saying that an API function
only works within a concrete range of values and test that it really
works ok within that range. However, in practice, user programs are full
of bugs, which usually means that under certain conditions, they are not
using the APIs as they are suposed to be used. That said, any user would
prefer such API to handle those situations as safely as possible: if I'm
writing a GTK+ application I'd prefer it to safely handle a missuse on
my side and warn me about the issue, than breaking badly due to a
segmentation fault and make me lose all my data ;)

> > I'll add here some points supporting Check ;):
> 
> ok, adressing them one by one, since i see multiple reasons for not
> using Check ;)
> 
[...]
> it's not clear that Check (besides than being an additional dependency in
> itself) fullfils all the portability requirements of glib/gtk+ for these
> cases though.
> 
[...]
> - Check may be widely used, but is presented as "[...

Re: Gtk+ unit tests (brainstorming)

2006-11-16 Thread Tim Janik
On Thu, 26 Oct 2006, Tristan Van Berkom wrote:

> Tim Janik wrote:

>>> (sometime one property has no
>>> meaning if another one hasnt been setup yet - in which case a
>>> g_return_if_fail() guard would be appropriate).
>>
>>
>> wrong, some proeprty values are intentionally set up to support
>> free-order settings. e.g. foo=5; foo-set=true; or foo-set=true; foo=5;
>> every order restriction or property dependency that is introduced in the
>> proeprty API makes generic handling of properties harder or sometimes
>> impossible so should be avoided at all costs.
>> (examples for generic property handling code are language bindings,
>> property editors, glade, gtk-doc)
>
> I think thats a little uncooperative - wouldnt you say on the other hand
> - that
> the independance of property ordering - aside from possible real weird
> corner
> cases - should be promoted as much as possible to make life easier for
> language
> binding writers, property editors and such ?

yes, that is exactly what i was trying to say. sorry if it was hard to
understand.
if at all possible any possible ordering should be supported when
setting properties on an object. at least, no restrictions should be
introduced by an object implementation that can reasonably be worked
around.

for unit tests, we can simply do things like pick properties in random
orders and set them to random values. we can then succesively fix gtk
cases that unreasonably rely on proeprty orders, or add rules to the
test cases about non-fixable ordering requirements.
note that *some* ordering support is already present in the GObject
API by flagging properties as CONSTRUCT or CONSTRUCT_ONLY.

>>> Ummm, while I'm here - I'd also like to say that - (I'm not about to
>>> dig up the quote from the original email but) - I think that there
>>> is some value to reporting all the tests that fail - even after one
>>> of the tests has failed, based on the same principals that you'd
>>> want your compiler to tell you everything that went wrong in its
>>> parse (not just the first error),
[...]
> Sure I'll elaborate, what I mean is - if you dont want to pollute
> the developers console with alot of errors from every failing test,
> you can redirect them to some log file and only say
> "at least one test failed" - maybe depict which one was the first
> failure - and then point to the log.
>
> Your arguments against collecting all the failures seem to
> apply fine to gcc output too - and yes you might usually end up just
> fixing the first error and recompiling, yet I manage to still appriciate
> this feature of verbosity - delivered at the not too high cost of
> wording a makefile a little differently (for unit tests that is) - i'd
> think.

ok thanks, i think i get the idea now ;)
i've adressed getting more than a single test failure reported without any
need to modify tests in yesterdays email on this topic:
   http://mail.gnome.org/archives/gtk-devel-list/2006-November/msg00077.html
i'd hope that also covers what you're looking for.

> Cheers,
> -Tristan

---
ciaoTJ
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ unit tests (brainstorming)

2006-11-16 Thread Iago Toral Quiroga
El mié, 15-11-2006 a las 10:51 +0100, Iago Toral Quiroga escribió:
> > > I'll add here some points supporting Check ;):
> > 
> > ok, adressing them one by one, since i see multiple reasons for not
> > using Check ;)
> > 
> [...]
> > it's not clear that Check (besides than being an additional
> dependency in
> > itself) fullfils all the portability requirements of glib/gtk+ for
> these
> > cases though.
> > 
> [...]
> > - Check may be widely used, but is presented as "[...] at the moment
> only
> >sporadically maintained" (http://check.sourceforge.net/).
> >that alone causes me to veto any Check dependency for glib/gtk
> already ;)
> 
> I've just asked Chris Pickett (Check maintainer) about this issues, so
> he can confirm. I'll forward his opinion in a later mail. 

Here it is:

--

Hi Iago,

Well, I looked at most C unit testing frameworks out there, and ended up
settling on Check.

That I wrote Check is sporadically maintained on the web page means that
it has reached a point where it is fairly stable, and it does the job 
well for people that use it.  You can quote me on that if you want (in 
fact, I'm going to update the web page).  There are lots of users, a 
very low-volume mailing list, and not very many open bugs.  Search 
Google for srunner_create or something and you will see what I mean.

It has some problems with failing its own unit tests at the moment when 
built, but I think it has to do with some hard-coded timeout in the unit
tests and the speed of newer processors.  I haven't actually encountered
any problems using it myself.

As for portability, I don't think there are any serious issues, I 
remember seeing a Windows patch somewhere.  I recently updated it to 
Autoconf 2.50+ and switched the documentation from DocBook to Texinfo.

I know people love to rewrite the world, but in this case, I would 
recommend just using Check for now, and if any problems are encountered,
then write some throwaway scripts to convert the tests to a new format, 
or fix what's actually broken.  It can't be difficult to do either way, 
and I think it would save a lot of time.  Right now, they're proposing a
big speculative design before knowing through experience what their 
needs for GTK+ really are, and experience will help a lot: either they 
will say, "Oh, hey, Check is actually great!" or they will say, "Damn, 
these are all the things that sucked about Check and let's make sure we 
get them right this time!"

Probably my best general advice is not to write too many tests, and not 
to go crazy testing assertions, but that has nothing to do with Check. 
You might want to get gcov working at the same time.

You might also do well to send some emails to other projects, e.g. 
GStreamer, and find out what they think of it, whether they would 
recommend it, etc. etc.  I'd be interested to know what they say.  Hmm, 
I just looked at email #3 and I see Stefan from GStreamer recommending 
Check.  Well, I would just listen to him, quite frankly.

Cheers,
Chris

Iago Toral Quiroga wrote:
> Hi Chris,
> 
> there is some debate in GTK+ about unit testing. I tried to convince
> people to use Check as framework but it seems they prefer doing
> something from scratch.
> 
> The main points against Check that they have provided are:
> 
>* The web page (check.sourceforge.net) states that Check is
> sporadically maintained.
> 
>* It's not clear that Check fullfils all the portability
requirements
> of glib/gtk+.
> 
>* They think the functionality needed to fulfill glib/gtk+ testing
> needs can be implemented ad-hoc with not too much effort, avoiding
> another dependency.
> 
> What's your opinion about this issues?
> 
> Just in case you want to follow the debate:
>
http://mail.gnome.org/archives/gtk-devel-list/2006-October/msg00093.html
>
http://mail.gnome.org/archives/gtk-devel-list/2006-October/msg00129.html
>
http://mail.gnome.org/archives/gtk-devel-list/2006-October/msg00167.html
>
http://mail.gnome.org/archives/gtk-devel-list/2006-November/msg00077.html
> 
> (Actually this is a bigger debate than just using Check or not, so I
> wrote only some URLs that touch the Check debate at some point).
> 
> Cheers,
> Iago.



___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list