Re: [Geeqie-devel] On refcounting

2011-05-07 Thread Omari Stephens
On 05/07/2011 09:20 AM, Vladimir Nadvornik wrote:
> Hi,
>
> On St 4. května 2011, Omari Stephens wrote:
>> Howdy, all
>>
>> I'm poking around with the marks code.  Currently, marks are _really
>> slow_ when you connect at least one mark to an image tag — when an image
>> with no marks drops to zero refs, we lose track of what marks it may
>> have, and on the next marks filtering op, we end up having to hit the
>> XMP for every file in the directory.
>>
>
> This analysis is not exactly right.
>
> 1. All files in current directory are hold in ViewFile->list so they can't
> have zero refs.
>
> 2. When there is a connection between mark and keyword, the mark value is
> no longer taken from FileData, but file_data_get_mark() calls
> file_data_get_mark_func and uses it's result. The value in FileData is then
> updated, but it is important only for keeping the last value after the
> connection is switched off.
>
> 3. file_data_get_mark_func points to meta_data_get_keyword_mark which then
> calls metadata_read_list(). It uses cached metadata when they are available
> or it really reads them from file.
>
> So IMHO the problem appears the number of files in the directory is higher
> than the number of files with cached metadata. Caching all metadata is quite
> expensive (even 100KB per file), so I would fix this by implementing another
> long-term cache for selected metadata entries (keywords) in
> metadata_read_list().

Hmm.  So the observation I made is that with the stock code, for all 
files _except_ for those with some marks set, fd->valid_marks is reset 
to 0 each time we enter file_data_get_mark.  That automatically means 
that fd->marks is a cache miss and we hit file_data_get_mark_func as you 
mentioned.

If I do a single unbalanced file_data_ref_fd(fd) after sticking a valid 
value into the fd->marks cache, then we get cache hits on fd->marks 
every time thereafter.  So there's something that's causing the cached 
marks to get GCed which doesn't happen when I do a single extra ref 
increment.

That said, I'll print out the actual refcount or something to try to 
figure out what's going on.  Also, I wasn't aware that there was another 
metadata cache, I'll go poke at that.

--xsdg

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Geeqie-devel mailing list
Geeqie-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geeqie-devel


Re: [Geeqie-devel] On refcounting

2011-05-07 Thread Michael Schwendt
On Sat, 7 May 2011 11:41:00 +0200, VN wrote:

> > FWIW, and considering that the current refcounting in Geeqie is
> > troublesome, you've got my blessing. :-) I'm all for a central list of
> > the current directory's file data objects, with added maintenance
> > structures on top of that.
> 
> What do you mean by _central_? There is ViewFile->list for each window,
> but then there are standalone files too.

Central = a global list for each view. With a manager that maintains the
list (including the refreshing of the directory contents if external tools
modify the dir), the parent/child relationships, the references to
additional data caches, and the freeing of the FileData objects. The
manager would have the ultimate decision power whether and when to free
memory and to remove something from the list.

> > Additionally, the FileData unref method is questionable already (not just
> > due to its fd->magick checking and special handling of child nodes), since
> > for example it does
> > 
> > fd->ref--;
> > 
> > if (fd->ref == 0)
> > {
> > /* decide whether to file_data_free() or return earlier */
> > }
> > return;
> > 
> > and hence doesn't cover the case that ref<0, and the responsibility to
> > drop ptrs to freed FileData from any lists or locals, is not within this
> > method.
> > 
> 
> file_data_free is called when the file and it's parent and children all have 
> zero ref count. Then it frees whole group.

Who ensures that it is called like that?
The current implementation bails out if these contraints aren't met,
but the refcount is decreased nevertheless, as that's the first thing the
unref method does.  The unref method so far results in several of its
assertions to abort Geeqie (even sidecarfiles==NULL has happened due to the
file grouping issues), and the bugs that mess with the FileData objects
are elsewhere in the code, not in the unref method.

> ref<0 should not happen. The only possible handling in unref function is to 
> write an error message and crash.
>
> There may be some missing ref/unref calls in the code but this is fixable. At 
> the moment I don't think we should change the infrastructure.

I'm more concerned about too many unref calls _and_ unref calls for
invalid FileData pointers. That users sometimes manage to crash Geeqie
with the fd->magick=0x12345678 assertion failing is evidence of either
an unref of freed FileData or memory corruption. Currently I'm trying out
a simplified ViewFile tree lister in an attempt at tracking down further
file_data_unref breakage.

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Geeqie-devel mailing list
Geeqie-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geeqie-devel


Re: [Geeqie-devel] On refcounting

2011-05-07 Thread Vladimir Nadvornik
On Čt 5. května 2011, Michael Schwendt wrote:
> On Wed, 04 May 2011 16:30:44 +, OS wrote:
> 
> > That said, I imagine that we'd be better off just always holding a ref 
> > for every FileData in the current directory.  On my machine, a FileData 
> > is 176 bytes and a GList is 24 bytes, so for a directory of 20,000 
> > images, we'd be spending 3.8MB holding FileDatas.  That's nothing. 
> 
> > Thoughts?
> 
> FWIW, and considering that the current refcounting in Geeqie is
> troublesome, you've got my blessing. :-) I'm all for a central list of
> the current directory's file data objects, with added maintenance
> structures on top of that.

What do you mean by _central_? There is ViewFile->list for each window,
but then there are standalone files too.

> 
> Even with the couple of patches I've added to the tracker, there's at
> least one code path that sometimes (but rarely) unrefs FileData and crashes.
> One fundamental problem I see is that the code works on plain C pointers
> to FileData structs, which may point at freed memory. With unref calls
> spread through-out the code and g_lists of ptrs in some places, that
> simply bears the risk that unref and g_list remove can get out of sync.
> Additionally, the FileData unref method is questionable already (not just
> due to its fd->magick checking and special handling of child nodes), since
> for example it does
> 
>   fd->ref--;
> 
>   if (fd->ref == 0)
> {
>   /* decide whether to file_data_free() or return earlier */
> }
> return;
> 
> and hence doesn't cover the case that ref<0, and the responsibility to
> drop ptrs to freed FileData from any lists or locals, is not within this
> method.
> 

file_data_free is called when the file and it's parent and children all have 
zero ref count. Then it frees whole group.

ref<0 should not happen. The only possible handling in unref function is to 
write an error message and crash.

There may be some missing ref/unref calls in the code but this is fixable. At 
the moment I don't think we should change the infrastructure.

Vladimir


--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Geeqie-devel mailing list
Geeqie-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geeqie-devel


Re: [Geeqie-devel] On refcounting

2011-05-07 Thread Vladimir Nadvornik
Hi,

On St 4. května 2011, Omari Stephens wrote:
> Howdy, all
> 
> I'm poking around with the marks code.  Currently, marks are _really 
> slow_ when you connect at least one mark to an image tag — when an image 
> with no marks drops to zero refs, we lose track of what marks it may 
> have, and on the next marks filtering op, we end up having to hit the 
> XMP for every file in the directory.
> 

This analysis is not exactly right.

1. All files in current directory are hold in ViewFile->list so they can't 
have zero refs.

2. When there is a connection between mark and keyword, the mark value is 
no longer taken from FileData, but file_data_get_mark() calls 
file_data_get_mark_func and uses it's result. The value in FileData is then 
updated, but it is important only for keeping the last value after the 
connection is switched off.

3. file_data_get_mark_func points to meta_data_get_keyword_mark which then 
calls metadata_read_list(). It uses cached metadata when they are available
or it really reads them from file.

So IMHO the problem appears the number of files in the directory is higher 
than the number of files with cached metadata. Caching all metadata is quite 
expensive (even 100KB per file), so I would fix this by implementing another
long-term cache for selected metadata entries (keywords) in 
metadata_read_list().

Vladimir



--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Geeqie-devel mailing list
Geeqie-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geeqie-devel


Re: [Geeqie-devel] On refcounting

2011-05-05 Thread Michael Schwendt
On Wed, 04 May 2011 16:30:44 +, OS wrote:

> That said, I imagine that we'd be better off just always holding a ref 
> for every FileData in the current directory.  On my machine, a FileData 
> is 176 bytes and a GList is 24 bytes, so for a directory of 20,000 
> images, we'd be spending 3.8MB holding FileDatas.  That's nothing. 

> Thoughts?

FWIW, and considering that the current refcounting in Geeqie is
troublesome, you've got my blessing. :-) I'm all for a central list of
the current directory's file data objects, with added maintenance
structures on top of that.

Even with the couple of patches I've added to the tracker, there's at
least one code path that sometimes (but rarely) unrefs FileData and crashes.
One fundamental problem I see is that the code works on plain C pointers
to FileData structs, which may point at freed memory. With unref calls
spread through-out the code and g_lists of ptrs in some places, that
simply bears the risk that unref and g_list remove can get out of sync.
Additionally, the FileData unref method is questionable already (not just
due to its fd->magick checking and special handling of child nodes), since
for example it does

fd->ref--;

if (fd->ref == 0)
{
/* decide whether to file_data_free() or return earlier */
}
return;

and hence doesn't cover the case that ref<0, and the responsibility to
drop ptrs to freed FileData from any lists or locals, is not within this
method.

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Geeqie-devel mailing list
Geeqie-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geeqie-devel


[Geeqie-devel] On refcounting

2011-05-04 Thread Omari Stephens
Howdy, all

I'm poking around with the marks code.  Currently, marks are _really 
slow_ when you connect at least one mark to an image tag — when an image 
with no marks drops to zero refs, we lose track of what marks it may 
have, and on the next marks filtering op, we end up having to hit the 
XMP for every file in the directory.

My current plan is to just ref every FileData in the directory if marks 
are turned on (and then unref them all when we change directories), 
which means that marks filter operations will be hitting an in-memory 
cache rather than having to go to disk _and_ reparse all the XML.

That said, I imagine that we'd be better off just always holding a ref 
for every FileData in the current directory.  On my machine, a FileData 
is 176 bytes and a GList is 24 bytes, so for a directory of 20,000 
images, we'd be spending 3.8MB holding FileDatas.  That's nothing.  In 
turn, it'd mean that we could suddenly sort or filter by any XMP or EXIF 
tag and it'd be fast, without having to do any refcount special-casing. 
  We'd hit the disk at most once for every FileData in the directory, 
and that's it.

Thoughts?

--xsdg

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Geeqie-devel mailing list
Geeqie-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geeqie-devel