Re: A tale of waiting

2009-06-29 Thread Federico Mena Quintero
On Tue, 2009-06-23 at 22:16 +0200, Benjamin Otte wrote:
 I have been on a quest to improve performance of the file chooser

First, thanks for taking on this work.  Kill GtkFileSystem is a very
worthwhile goal.

  * The sort function was called way too often. So I added the option
 to freeze and thaw the tree view. Now the sort function was only
 called once or twice.

Hmm.  I haven't read your branch yet, but some things I remember from
profiling the file chooser a while ago:

- It's pretty bad that GtkTreeModel only has row-inserted for a single
row, instead of rows-inserted --- it gives you automatic insertion
sort, which sucks.  So for every single file we read in, the treeview
has quite a bit of machinery to run.  This is why (at least before the
GIO work; I don't know about the current state) I made the file chooser
do this:

  1. Start with an empty model not hooked to the treeview.
  2. Start a timer.
  2. Start loading a folder and populating the model.
  3. If the timer expires, put the model into the treeview.
  4. If the folder finishes loading before the timer expires,
 put the model into the treeview.

That's the LoadState enumeration, by the way.  Ideally, small folders
will be completely populated into the model before the timer expires.
We can work on making this as fast as possible, and *then* work on what
happens when the poor model actually gets put inside the treeview.

 What is left to do?
  *  * reimplement directory monitoring. I did never get around adding that 
 code.

Don't worry about this.  We can have a custom model, just for directory
trees, once we decide to delve into that for SELECT_FOLDER mode.

  * evaluate fixed height mode for the tree view, it might make things
 even faster without losing features.

Hmmm, do we use the automatic-column-widths thing?  That should
definitely be turned off; we can hardcode an initial starting width of
N ems to avoid measuring all the rows.

  * port search and recent files to GtkFileSystemModel, to get rid of
 more special casing code and make them faster. (Woohoo!)

I'm not 100% convinced that this is the right thing to do, but I'll read
your branch before commenting :)

  * The async implementation of g_file_enumerator_next_files_async() is
 very suboptimal, as it stops after N files are enumerated, and waits
 for another call to the function to resume.

See the thing above about the use of a timer.  We should be able to call
that function several times before the timer expires, and still avoid
re-sorting because the model is not yet in the treeview.

 And now the obvious question: Should I just merge it to master when
 I'm done with the regressions?

Please give me some time to review this.

Will you be at GCDS?  I'd love to do patch review together.

  Federico

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-26 Thread Jernej Simončič
On Thu, 25 Jun 2009 12:12:05 +0200, Kristian Rietveld wrote:

 As a side comment, I am not really sure how fair it is to compare a
 full blown GUI to a command line utility with its output redirected to
 /dev/null.  I do agree that a load time of 4 to 6 seconds is too long.

I don't know about Linux, but on Windows the native chooser displays large
directories almost instantly, while the GTK+ one is very slow (eg.
directory with 970 images takes about 2 seconds to display in the native
file chooser, while in GTK+'s it takes 12 seconds).

-- 
 Jernej Simončič  http://eternallybored.org/ 

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-26 Thread Mathieu Lacage
On Fri, 2009-06-26 at 09:31 +0100, Bastien Nocera wrote:
 On Fri, 2009-06-26 at 10:25 +0200, Jernej Simončič wrote:
  On Thu, 25 Jun 2009 12:12:05 +0200, Kristian Rietveld wrote:
  
   As a side comment, I am not really sure how fair it is to compare a
   full blown GUI to a command line utility with its output redirected to
   /dev/null.  I do agree that a load time of 4 to 6 seconds is too long.
  
  I don't know about Linux, but on Windows the native chooser displays large
  directories almost instantly, while the GTK+ one is very slow (eg.
  directory with 970 images takes about 2 seconds to display in the native
  file chooser, while in GTK+'s it takes 12 seconds).
 
 Except that the Windows explorer actually has knowledge of the
 filesystem itself. This was mentioned and discussed with Darin Adler

Was this discussion public ?

 (very many years ago, as he mentioned he got that information from the
 BeFS developer (Italian lad, whose name escapes me right now)).

That would be Dominic Giampaolo.

Mathieu

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-26 Thread Bastien Nocera
On Fri, 2009-06-26 at 10:54 +0200, Mathieu Lacage wrote:
 On Fri, 2009-06-26 at 09:31 +0100, Bastien Nocera wrote:
  On Fri, 2009-06-26 at 10:25 +0200, Jernej Simončič wrote:
   On Thu, 25 Jun 2009 12:12:05 +0200, Kristian Rietveld wrote:
   
As a side comment, I am not really sure how fair it is to compare a
full blown GUI to a command line utility with its output redirected to
/dev/null.  I do agree that a load time of 4 to 6 seconds is too long.
   
   I don't know about Linux, but on Windows the native chooser displays large
   directories almost instantly, while the GTK+ one is very slow (eg.
   directory with 970 images takes about 2 seconds to display in the native
   file chooser, while in GTK+'s it takes 12 seconds).
  
  Except that the Windows explorer actually has knowledge of the
  filesystem itself. This was mentioned and discussed with Darin Adler
 
 Was this discussion public ?

I believe it was discussion around gnome-vfs at the time, at GUADEC in
Denmark (which doesn't make a few of us feel any younger).

  (very many years ago, as he mentioned he got that information from the
  BeFS developer (Italian lad, whose name escapes me right now)).
 
 That would be Dominic Giampaolo.

That would be him, thanks.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-26 Thread Dirk-Jan Binnema
2009/6/26 Bastien Nocera had...@hadess.net

 On Fri, 2009-06-26 at 10:25 +0200, Jernej Simončič wrote:
  On Thu, 25 Jun 2009 12:12:05 +0200, Kristian Rietveld wrote:
 
  I don't know about Linux, but on Windows the native chooser displays
 large
  directories almost instantly, while the GTK+ one is very slow (eg.
  directory with 970 images takes about 2 seconds to display in the native
  file chooser, while in GTK+'s it takes 12 seconds).

  Except that the Windows explorer actually has knowledge of the
  filesystem itself. This was mentioned and discussed with Darin Adler
  (very many years ago, as he mentioned he got that information from the
  BeFS developer (Italian lad, whose name escapes me right now)).


For a little fs-specific trick on Linux: If you actually have to *open* the
files (e.g, for sniffing), you can make things quite a bit faster by sorting
the files in order of inode first, if you are on ext3 (and ext4 I suppose,
as well as other hashed-b-tree supporting file systems). I got around 40%
improvement.

On file systems that don't support it, the overhead seems negligible.
For some details: http://djcbflux.blogspot.com/2008/10/seek-destroy.html

Best wishes,
Dirk.
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-26 Thread Mathieu Lacage
On Fri, 2009-06-26 at 12:09 +0300, Dirk-Jan Binnema wrote:

 For a little fs-specific trick on Linux: If you actually have to
 *open* the files (e.g, for sniffing), you can make things quite a bit
 faster by sorting the files in order of inode first, if you are on
 ext3 (and ext4 I suppose, as well as other hashed-b-tree supporting
 file systems). I got around 40% improvement.

the beos file manager used the same trick on befs (this bit of trivia
comes from pavel cisler, the beos file manager developer).

Mathieu

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-26 Thread Alexander Larsson
On Fri, 2009-06-26 at 12:09 +0300, Dirk-Jan Binnema wrote:
 
 
 2009/6/26 Bastien Nocera had...@hadess.net
 On Fri, 2009-06-26 at 10:25 +0200, Jernej Simončič wrote:
  On Thu, 25 Jun 2009 12:12:05 +0200, Kristian Rietveld
 wrote:
 
  I don't know about Linux, but on Windows the native chooser
 displays large
  directories almost instantly, while the GTK+ one is very
 slow (eg.
  directory with 970 images takes about 2 seconds to display
 in the native
  file chooser, while in GTK+'s it takes 12 seconds).
 
 
  Except that the Windows explorer actually has knowledge of
 the
  filesystem itself. This was mentioned and discussed with
 Darin Adler
  (very many years ago, as he mentioned he got that
 information from the
  BeFS developer (Italian lad, whose name escapes me right
 now)).
 
 For a little fs-specific trick on Linux: If you actually have to
 *open* the files (e.g, for sniffing), you can make things quite a bit
 faster by sorting the files in order of inode first, if you are on
 ext3 (and ext4 I suppose, as well as other hashed-b-tree supporting
 file systems). I got around 40% improvement.

Gio already does this.


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-26 Thread Jernej Simončič
On Fri, 26 Jun 2009 09:31:20 +0100, Bastien Nocera wrote:

 Except that the Windows explorer actually has knowledge of the
 filesystem itself.

Does this apply to directories shared from Linux through Samba (because
that's where I was testing)?

-- 
 Jernej Simončič  http://eternallybored.org/ 

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-26 Thread Benjamin Otte
I'm done now. And I've filed
http://bugzilla.gnome.org/show_bug.cgi?id=587091 as merge request.

And to keep you up to date:

 What is left to do?
  *  * reimplement directory monitoring. I did never get around adding that 
 code.

This was implemented.

  * Fix the usage of a filter on the mime type. Currently we don't
 query the mime type (it requires file sniffing after all), so we never
 get matches.

The mime type is now queried, as it doesn't cause big performance
regressions. (see mails from Mathias and Bastien).

  * File bug(s) about the GValue and interface slowness
  * evaluate fixed height mode for the tree view, it might make things
 even faster without losing features.

See my mail exchange with Kris.

  * port search and recent files to GtkFileSystemModel, to get rid of
 more special casing code and make them faster. (Woohoo!)

done.

  * 50+% of the remaining CPU is spent in GtkTreeView's validate_row().
 I have no clue what that function even does. Is there a way to get rid
 of it?

See mails with Kris

  * 20+% of time is spent in enumerating the files, with
 lookup_attribute() in gio/gfileinfo.c accounting for 1/3rd of that
 time. There should be a way to use the numeric ids directly instead of
 always looking them up. In GLocalFileInfo it would make sense to me to
 use numeric ids directly. This is an even bigger problem in Nautilus
 (remember: it took 23s in the above test), as it queries a lot more
 attributes.

This is http://bugzilla.gnome.org/show_bug.cgi?id=587089

  * The async implementation of g_file_enumerator_next_files_async() is
 very suboptimal, as it stops after N files are enumerated, and waits
 for another call to the function to resume. It would be better if it
 just kept going, so the next call can return already stored files. The
 current behavior can cause excessive sorting in the current
 implementation.

And this is bug http://bugzilla.gnome.org/show_bug.cgi?id=587090

Cheers,
Benjamin
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-25 Thread Kristian Rietveld
Hi,

A brief reply on the tree view bits.  I have to admit that I've not
looked at the code.

On Tue, Jun 23, 2009 at 10:16 PM, Benjamin Otteo...@gnome.org wrote:
 The first thing I did was set a goal. I decided to target the ls
 command, because I know it to be fast and because there's no reason my
 terminal should be faster than my file chooser for listing files -
 even if Behdad and the Chris'es maintain it. So i ran time ls -la 
 /dev/null, it took about 0.1s. This means that everything that takes
 more than 2% total CPU time needs to go away. I should note that I do
 not care about memory usage a lot, only about execution speed, as a
 file chooser should go away quickly and give you back your memory.

As a side comment, I am not really sure how fair it is to compare a
full blown GUI to a command line utility with its output redirected to
/dev/null.  I do agree that a load time of 4 to 6 seconds is too long.

  * Sorting the list
 The file chooser uses a GtkTreeModelSort to sort. A huge amount of CPU
 was spent trying to sort. And when looking at the ls time (which does
 sort its output), this shouldn't happen.

I would guess this is due to the files being added to the model one by
one, which means that they are sorted one by one by the
GtkTreeModelSort.  This uses insertion sort on an array, which is
slow.

 Next I implemented sorting. Because I use an array, I can use qsort(),
 which is fast. I had thought about switching to GSequence to get ever
 closer to GtkListStore, but I did not do that, and one of the reasons
 is that GSequence uses insertion sort, which is quite slow. Sorting
 40.000 elements takes over a second in a small test I did.
 After I had implemented sorting, the file chooser was abysmally slow
 again - taking around 25s. This was for two reasons:
  * The sort function was called way too often. So I added the option
 to freeze and thaw the tree view. Now the sort function was only
 called once or twice.

Where is this sort function located?  Judging from the following comment ...

  * gtk_tree_model_get() being called in the iter comparison functions
 took almost all the CPU. 25% of the time was due to needlessly copying
 GValues, but 75% of the time was spent in lookup of the GtkTreeModel
 interface. As the comparison function is in a critical spot for
 performance, this was inacceptable. Also, using
 gtk_tree_model_get_value() directly didn't get rid of any of these
 problems. So I added an access function to GtkFileSystemModel and the
 problem was gone.

... I guess the sort function is located in the file chooser dialog
(as opposed to being inside the model) and thus has to use the
gtk_tree_model_get() calls.  Since the sort function is called often
and the main part of the sort function is to fetch the values needed
for comparison from the model, it is not weird that there is a large
overhead introduced by GObject's implementation of interfaces.  I
don't know how many different sort functions are used in the file
chooser.  If it is only one, you could make GtkFileSystemModel
implement the GtkTreeSortable interface (I get the feeling you've
already done that) and move the sort function to be internal to
GtkFileSystemModel (so it can access all values directly).  Make it
the default sort function that can be overridden using the
GtkTreeSortable interface.  This is in fact a circumvention of the
GtkTreeSortable and GtkTreeModel interfaces, but it does eliminate the
GtkTreeModel overhead you are seeing.  (Also, if the default sort
function is overridden, that new function will suffer from the model
overhead again.)

Adding freeze and thaw to tree view is not an option.  It has been
designed in the past to live without freeze and thaw and the addition
of these functions has been repeatedly refused.  It would be good to
know why the sort function is called so often -- is this because the
files come in one by one?

  * evaluate fixed height mode for the tree view, it might make things
 even faster without losing features.

Yes, this will remove the full pass through the model.  More on this below.

 There are some remaining performance issues, that I didn't touch on
 yet, but that are quite important for really catching up to ls.
  * 50+% of the remaining CPU is spent in GtkTreeView's validate_row().
 I have no clue what that function even does. Is there a way to get rid
 of it?

validate_row() is the main part of the validation functionality.  What
it basically does is, given a row, it will set the data (retrieved
from the model) on all cell renderers and then call the get_size()
method on each renderer.  When done, the tree view knows the minimal
size this row needs to display itself.  The proper row size is then
determined and set.  By default, tree view does a full pass through
the model to do this for each row.  At the end, the sizes of the
scroll bars will be fully correct.

Fixed height mode bypasses this by only measuring the first row, and
then set the same size for each row. 

Re: A tale of waiting

2009-06-24 Thread Soeren Sandmann
Matthias Clasen matthias.cla...@gmail.com writes:

  Next I implemented sorting. Because I use an array, I can use qsort(),
  which is fast. I had thought about switching to GSequence to get ever
  closer to GtkListStore, but I did not do that, and one of the reasons
  is that GSequence uses insertion sort, which is quite slow.
 
 Not sure why you keep repeating this, it is not true. GSequence is
 backed by a balanced tree, so sorting by insertion gives you O(n log
 n), so at least asymptotically,
 it is fine. The implementation may of course still be slower than
 qsort().

GSequence is indeed implemented with a randomized, balanced binary
tree (a treap[1] to be specific), so sorting is expected O(n log n).
While sorting a linked data structure is going to be slower than
qsort() on an array, 1 second for 40,000 elements sounds wrong,
so I'd be interested in seeing the benchmark.


Thanks,
Soren

[1] http://en.wikipedia.org/wiki/Treap
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-24 Thread Alberto Ruiz
2009/6/23 Bastien Nocera had...@hadess.net:
 On Tue, 2009-06-23 at 22:16 +0200, Benjamin Otte wrote:
 I have been on a quest to improve performance of the file chooser
 lately. This post is about this process: what I measured, what I
 learned and what I patched.
 snip
  * Getting the mime type
 Getting the mime type for a file requires opening the file and
 checking the first bytes against the known patterns for files. Both
 operations take time. The mime type also is usually not required.


 It will only open the file if the suffix of the file can't be used,
 please read the shared-mime-info spec.

Yes, I've been looking into this code recently, there's a fast mode,
and it's only when the fast mode happens to be ambiguous, that a short
stream of the file is taken for sniffing.

 snip
  * Fix the usage of a filter on the mime type. Currently we don't
 query the mime type (it requires file sniffing after all), so we never
 get matches.

 As above.

 In majority of cases it should be a string match.
 snip
 And now the obvious question: Should I just merge it to master when
 I'm done with the regressions?

 Personally, I'd rather see the changes merged incrementally, rather than
 a wholesale rewrite going in. Makes it much easier to pinpoint what
 broke the file chooser (as is likely to happen, it's not like software
 is ever bug-free).

 Cheers

 ___
 gtk-devel-list mailing list
 gtk-devel-list@gnome.org
 http://mail.gnome.org/mailman/listinfo/gtk-devel-list




-- 
Un saludo,
Alberto Ruiz
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-24 Thread Benjamin Otte
On Wed, Jun 24, 2009 at 4:39 AM, Matthias Clasen
matthias.cla...@gmail.com wrote:
 As Bastien already pointed out, this is just not true. Getting the
 mime type does not use sniffing in most cases. /usr/bin is a the worst
 case where extension-based mimetype detection breaks down, but it
 should do fine in your homedir, unless you are weird.

I'm sorry. I just assumed it always used content sniffing, because it
was prominently showing up in sysprof's output.
It also turns out I'm weird. I have a lot of log files in my home
directory with a custom extension and that triggers sniffing.

So I set out to measure it again, comparing the output of these two
commands in various directories:
time gvfs-ls -a
standard::name,standard::type,standard::display-name,standard::is-hidden,standard::is-backup,standard::size,time::modified
 /dev/null
time gvfs-ls -a
standard::name,standard::type,standard::display-name,standard::is-hidden,standard::is-backup,standard::size,time::modified,standard::content-type
 /dev/null
Those are the attributes used by my current file chooser, comparing
those two commands with a warm cache. Then I ran the same commands
again with a preceeding echo 1  /proc/sys/vm/drop_caches command:
 * my home dir: 0.2s vs 0.45s
1.6s vs 12.9s
This has the aforementioned log files that trigger caching.
 * /usr/bin: 0.15s vs 0.65s
 1.3s vs 18.2s
This is the worst case scenario.
 * my test dir: 1.8s vs 2.2s
7.4s vs 7.9s
This dir is interesting, because there's only .txt and .png files, so
no sniffing happens.

So it seems content-type sniffing incurs a 20% penalty for
g_file_query_info() if it takes the fast path, and is devastating if
it doesn't. Both of that is not nice and it'd be nice if populating
the file chooser would not be blocked by it, but it seems in most
cases it's not noticable. So I can just add the content-type back to
the file chooser and have one problem less to think about.

 Not sure why you keep repeating this, it is not true. GSequence is
 backed by a balanced tree, so sorting by insertion gives you O(n log
 n), so at least asymptotically,
 it is fine. The implementation may of course still be slower than qsort().

The biggest problem of course is calling the comparison function. And
I was under the impression that this number was significantly lower
for qsort. So I wrote a little program that counted them (attached).
And it turns out that:
- g_qsort_with_data() and g_sequence_sort() use roughly the same
number of comparisons
- glibc's qsort() takes 30% less comparisons than the other two.
- g_sequence_sort() has quite some overhead, though probably ignorable
for the file chooser

So I guess I will
a) file a bug about updating glib's qsort implementation
b) stop complaining about g_sequence_sort() being slow
The slowness I was seeing was probably related to the comparison
function using gtk_tree_model_get() (see previous mail).

  And now the obvious question: Should I just merge it to master when
  I'm done with the regressions?

 No, of course not. This needs to be reviewed and merged incrementally.
 For one thing,
 we don't know what other aspects of file chooser behaviour have been
 negatively affected by your optimization towards a single goal. Is
 scrolling still ok ? Does changing the sort column work reasonably ?
 Etc.

Yeah, the question was of course tongue-in-cheek.
I was wondering how to best proceed to get this code merged as sane as
possible, since it is a somewhat big chunk.


Benjamin
#include glib.h
#include stdlib.h

static guint sort;

static int
libc_compare_and_count (gconstpointer a, gconstpointer b)
{
  sort++;

  return *(gpointer **) a - *(gpointer **) b;
}

static int
compare_and_count (gconstpointer a, gconstpointer b, gpointer data)
{
  guint *count = data;

  (*count)++;

  return *(gpointer **) a - *(gpointer **) b;
}

static int
seq_compare_and_count (gconstpointer a, gconstpointer b, gpointer data)
{
  guint *count = data;

  (*count)++;

  return a - b;
}

int
main (int argc, char **argv)
{
  guint i;
  gpointer *list, *glist;
  GSequence *seq;
  guint N = 4;
  GTimer *timer;

  if (argc  1)
N = g_ascii_strtoull (argv[1], NULL, 0);

  list = g_new (gpointer, N);
  seq = g_sequence_new (NULL);
  for (i = 0; i  N; i++)
{
  list[i] = GINT_TO_POINTER (g_random_int ());
  g_sequence_append (seq, list[i]);
}
  glist = g_memdup (list, sizeof (gpointer) * N);

  timer = g_timer_new ();
  g_timer_start (timer);
  sort = 0;
  qsort (list, N, sizeof(gpointer), libc_compare_and_count);
  g_print (qsort(): %u - %gs\n, sort, g_timer_elapsed (timer, NULL));

  g_timer_reset (timer);
  sort = 0;
  g_qsort_with_data (glist, N, sizeof(gpointer), compare_and_count, sort);
  g_print (g_qsort_with_data(): %u - %gs\n, sort, g_timer_elapsed (timer, NULL));
  
  g_timer_reset (timer);
  sort = 0;
  g_sequence_sort (seq, seq_compare_and_count, sort);
  g_print (g_sequence_sort(): %u - %gs\n, sort, g_timer_elapsed 

Re: A tale of waiting

2009-06-24 Thread Matthias Clasen
On Wed, Jun 24, 2009 at 9:05 AM, Benjamin Otteo...@gnome.org wrote:


 So it seems content-type sniffing incurs a 20% penalty for
 g_file_query_info() if it takes the fast path, and is devastating if
 it doesn't. Both of that is not nice and it'd be nice if populating
 the file chooser would not be blocked by it, but it seems in most
 cases it's not noticable. So I can just add the content-type back to
 the file chooser and have one problem less to think about.

Yeah, doing mime type detection async in parallel to the loading has
some downsides too. It is somewhat disconcerting if the icons change
after the fact...


 So I guess I will
 a) file a bug about updating glib's qsort implementation

Here is one: http://bugzilla.gnome.org/show_bug.cgi?id=113203
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-24 Thread Andrew Cowie
On Wed, 2009-06-24 at 10:37 -0400, Matthias Clasen wrote:

 Yeah, doing mime type detection async in parallel to the loading has
 some downsides too. It is somewhat disconcerting if the icons change
 after the fact...

Maybe when we don't know what the type is, and are finding out async, we
could put a spinner of some kind where the icon is until it resolves [in
the same way that a loading tab in Epiphany works]?

AfC
Sydney



signature.asc
Description: This is a digitally signed message part
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


A tale of waiting

2009-06-23 Thread Benjamin Otte
I have been on a quest to improve performance of the file chooser
lately. This post is about this process: what I measured, what I
learned and what I patched.

I'll start with my motivation. Just like all my applications do with
their dotfiles, I use my home directory as a dumping ground for stuff.
PDFs of specs, files someone sent me or that I downloaded somewhere,
code I wrote, etc. All the stuff that doesn't have a place anywhere,
but is too important to throw away gets dumped into my home directory.
Currently on this laptop, my home directory contains 3380 files. And
whenever an application opens the file chooser, it shows my home
directory - either because it's the default directory or because I
explicitly told it to - it is my default dumping ground after all.
With a warm cache, the filechooser takes 4-6 seconds to display the
full list. It sometimes displays a partial list, but that is utterly
useless as this partial list is rapidly changing while more files get
added. Having to wait 4 seconds to select a file is not a good thing.
In fact it has become so annoying that last Friday I set out to change
it - after deciding that sorting my 3500 files would probably take
longer and wouldn't help other people either.

The first thing I did was set a goal. I decided to target the ls
command, because I know it to be fast and because there's no reason my
terminal should be faster than my file chooser for listing files -
even if Behdad and the Chris'es maintain it. So i ran time ls -la 
/dev/null, it took about 0.1s. This means that everything that takes
more than 2% total CPU time needs to go away. I should note that I do
not care about memory usage a lot, only about execution speed, as a
file chooser should go away quickly and give you back your memory.

Next, I set up a way to measure the performance of the file chooser.
For this I created a test directory containing 40.000 files like so:
  mkdir test-directory
  for i in `seq 1 2`; do echo This is a text file 
test-directory/file$i.txt; done
  for i in `seq 1 2`; do echo cp small-image.png
test-directory/image$i.png; done
Of course, this is only a test directory that might have
pathologically bad performance in certain cases, so keep this in mind.
I nonetheless think it does a pretty good job as a - even if somewhat
exaggerated - performance measurement for big directories. In
particular it does a very good job of finding O(n²) performance
issues.
For measuring I mostly use my watch. When posting overly accurate
results - I probably used time instead. The computer I'm running this
on is a 2nd generation Macbook with a dual-core 1.83GHz and enough
memory to always run the tests with a warm cache, so there is no
hard-disk IO happening. For finding the bottlenecks I used sysprof.

First I measured the time it takes in various applications to list the
test directory:
 *  0.8s - ls -la
 *  1.8s - gvfs-ls -a
standard::type,standard::is-hidden,standard::is-backup,standard::size,time::modified
 * 23s   - nautilus
 * 60s - thunar
 * 60s - the file chooser

Next, I looked, where most of the time is taken. These things were standing out:
 * Getting the mime type
Getting the mime type for a file requires opening the file and
checking the first bytes against the known patterns for files. Both
operations take time. The mime type also is usually not required.
 * Getting the icons
Actually loading the icon takes time. It's faster when the icon cache
can be used - which is not true for thumbnails, or in this case half
of the files - but even then it's slow. In fact, using the icon cache
is only about 30% faster than just loading the icon from disc. (see
bug 586161)
 * Sorting the list
The file chooser uses a GtkTreeModelSort to sort. A huge amount of CPU
was spent trying to sort. And when looking at the ls time (which does
sort its output), this shouldn't happen.
 * the GtkTreeCellDataFuncs
The file chooser used custom cell data funcs that computed the
required values on demand from a GFileInfo. Unfortunately, some of
these values are complex to compute - like the thumbnail. It would
make more sense to compute the values only once, cache them and use
gtk_tree_view_column_set_attributes() instead.
So the first thing I decided to do was rip all these things out and
make a tree model that quickly listed all the files as unsorted as
they come. For this I pretty much rewrote GtkFileSystemModel, as the
previous incarnation supported a tree-like layout (even though that
wasn't used) and a linked list for storage.
While I did that, I realized that a lot of code in the file chooser is
triplicated - it exists once each for browsing, recent files and
search. I tried to come up with a tree model that abstracts away these
differences so that the file chooser code could be simplified. So I
modified the GtkFileSystemModel API to look more like a GtkListStore.
By the time I had done that, it was Sunday evening, I had removed over
700 lines netto from gtkfilechooserdefault.c, had 

Re: A tale of waiting

2009-06-23 Thread Bastien Nocera
On Tue, 2009-06-23 at 22:16 +0200, Benjamin Otte wrote:
 I have been on a quest to improve performance of the file chooser
 lately. This post is about this process: what I measured, what I
 learned and what I patched.
snip
  * Getting the mime type
 Getting the mime type for a file requires opening the file and
 checking the first bytes against the known patterns for files. Both
 operations take time. The mime type also is usually not required.

It will only open the file if the suffix of the file can't be used,
please read the shared-mime-info spec.
snip
  * Fix the usage of a filter on the mime type. Currently we don't
 query the mime type (it requires file sniffing after all), so we never
 get matches.

As above.

In majority of cases it should be a string match.
snip
 And now the obvious question: Should I just merge it to master when
 I'm done with the regressions?

Personally, I'd rather see the changes merged incrementally, rather than
a wholesale rewrite going in. Makes it much easier to pinpoint what
broke the file chooser (as is likely to happen, it's not like software
is ever bug-free).

Cheers

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: A tale of waiting

2009-06-23 Thread Matthias Clasen
On Tue, Jun 23, 2009 at 4:16 PM, Benjamin Otteo...@gnome.org wrote:

 Next, I looked, where most of the time is taken. These things were standing 
 out:
  * Getting the mime type
 Getting the mime type for a file requires opening the file and
 checking the first bytes against the known patterns for files. Both
 operations take time. The mime type also is usually not required.

As Bastien already pointed out, this is just not true. Getting the
mime type does not use sniffing in most cases. /usr/bin is a the worst
case where extension-based mimetype detection breaks down, but it
should do fine in your homedir, unless you are weird.

 Next I implemented sorting. Because I use an array, I can use qsort(),
 which is fast. I had thought about switching to GSequence to get ever
 closer to GtkListStore, but I did not do that, and one of the reasons
 is that GSequence uses insertion sort, which is quite slow.

Not sure why you keep repeating this, it is not true. GSequence is
backed by a balanced tree, so sorting by insertion gives you O(n log
n), so at least asymptotically,
it is fine. The implementation may of course still be slower than qsort().


 And now the obvious question: Should I just merge it to master when
 I'm done with the regressions?

No, of course not. This needs to be reviewed and merged incrementally.
For one thing,
we don't know what other aspects of file chooser behaviour have been
negatively affected by your optimization towards a single goal. Is
scrolling still ok ? Does changing the sort column work reasonably ?
Etc.

The file chooser has already seen too many rewrites...
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list