On 29/04/2012 15:42, Colomban Wendling wrote:
* it support asynchronous parsing (though not concurrent parsing);

What's the difference? Also, what does it buy us?

What I meant when saying it's asynchronous but not concurrent is that it
supports launching the parsers in a separate thread, but it cannot
launch several parsers at once.  This is because CTags parsers aren't
thread-safe (have a lot of global states and no locks).

What asynchronous parsing gives us is quite simple: no blocking.  This
means that a slow parser (like e.g. the HTML one on Windows before you
changed the regex library) wouldn't lock Geany.  Same for project
plugins that want to parse thousands of files:  this would still take a
long time, but Geany would be still usable in the meantime.

Ok. It seems a good idea to support this, but for parsing tags in open documents it doesn't seem particularly useful, as this ought to be fast. The regex problem was unusual IMO and due to an old version of GNU regex.

I don't understand why tagmanager has to be replaced, why not just
replace the parts you want to improve? Rewriting it is likely to lead to
a new set of bugs and be hard to review and merge changes from master.

I think Lex and Matthew probably summarize it quite correctly:  it's not
that TM is bad or has to be replaced; it is that (I'm) unable to
understand it enough to fix anything in it, like scope completion.
Maybe it's only me, but I tried hard :)

And the only reason why maybe TagManager would need replacement/large
changes is asynchronous parsing.  I'm not sure how hard it would be to
get it with TM -- but again, maybe it's only that I don't understand it
enough.

Not sure about this, but I think the only messy part of TM is when it updates the workspace array, which I think could be removed (I'll explain this in another reply).

* there is 2 backends as of today:
    - a "simple" one that is simple and that doesn't waste (too much)
      memory, but that searches in O(n);

Linear searching seems like a bad idea when we currently have O(log n).
Removing x tags is O(n * x) by the look of it.

I agree it's not interesting regarding performances, and this backend
isn't meant for it.  I needed an implementation for early testing, so I
wrote it simple.  And it showed useful for testing later on too, since
because of it's simplicity it isn't bugged -- just slow :)

Ok.

BTW, how does TagManager do fast searches?  I always though it did a
sorting with new attributes each time they changed, so in such
situations it's even worse than O(n), isn't it?

For searching, it doesn't do any sorting ever. For adding tags the work object (i.e. document tags) have to be sorted, which I think is good, but also the workspace tags are currently resorted, which I think may be a bad design.

    - a "multi-cache" one that, as its name suggests, maintains multiple
      caches (sorted tags arrays).  it uses a little more memory and is
      slower on insertion since it maintains several sorted lists, but a
      search on an already existing cache is very fast.

Won't this be slow for adding many tags at once? How is this design
better than tagmanager?

Well, adding a tag would require a bsearch() on each cache, yes.
However, adding tags is mostly done in a separate thread (if async
parsing it used), so it can be less of a problem.

I haven't studied your design, but I would prefer that any design is efficient on all threads, so the user's computer can use remaining performance for compiling & whatever else they want.

Also, what about global tags? Those can add a lot of tags all at once.

And a search is simply a bsearch() (O(n log n), right?) given the cache
already exists.  If the cache doesn't exist, it has to be created so
yeah on the first search it's slow.

If it can be slow enough for the user to notice this is probably bad.

I don't see why having two is better. The memory overhead for a pointer
array is not much vs. the size of the tag structures. Fast searching is
important.

Multiple backends isn't really useful probably.  As said above, the
first was mostly a testing one, and I wanted to be able to keep both
during development for better testing.  At one point I also though maybe
there could be some backends optimized for particular situations, like
fast search, fast insertion, low memory consumption, etc., but I don't
see much use for this anymore.

Ok.

* tags (and most types) are reference-counted (so they aren't
    necessarily duplicated, e.g. in the multicache backend);

I don't really understand src/symbols.c since the real-time parsing
change, so don't really understand why this is needed.

BTW here I meant reparsing document tags without saving, sorry for any confusion.

It's not strictly needed, but it makes some memory management easier,
and fits well with GTK memory management.  And this last point helps a
lot to maintain the GtkTreeStore on src/symbols.c, now tags are updated
and not removed.
But that's not new, I already added this in TM.

Yes. Is the reason the tree should be updated and not recreated to preserve fold states and scrollbar position? In fact I'm a bit confused about this, how come old tags are still accessed after reparsing the document with new tags?

BTW, what don't you understand in symbols.c?  Maybe I could improve the
docs.

It just seems to have grown overcomplex - basically the reparsing change means much more code to manage the symbol tree. Isn't there a second hash table besides the parent tag hash table?

It's possible tagmanager needs better abilities so that the symbol list code doesn't need so much complexity.

* tag matching/finding is done using ctm_data_backend_find() (or a
    convenience wrapper), which takes 2 functions for performing the
    search:
    - a sort function, used to, heh, sort the tags to search and/or the
      resulting list (the "simple" backend should use it to sort the
      result; and the "multi-cache" backend uses it to sort the caches)

It doesn't sound a good idea to have to sort the haystack on each
lookup. Lookup should be very fast, like tagmanager. Adding/removing
tags is always less important because real-time updates can be disabled
anyway.

That's why the multicache backend (the one supposed to be somewhat fast)
only creates new cache on lookup if there isn't compatible cache yet,
which should happen roughly only once, on the first lookup.

Ok.

However, I still don't understand how TM does address the issue, and I
though it actually re-sorted the array on lookup if the attributes
changed.  And IIUC too, the attributes may change often if performing
search A and the B, then A again, the A, B, A, B and so forth.

The arrays are always sorted the same way, and only after tag creation. The search results are filtered by the lookup attributes.

All this isn't of course written in stone: if we already redo a lot of
things, let's get something nice -- though IMO we'll always be better
than with tagmanager, since each time I wanted to touch it it took me
hours, and sometimes I even haven't been able to fix the problem
(thinking of e.g. scope completion...).

I don't really see what the problem understanding it is. I thought scope
completion was just tm_workspace_find_scoped and related functions, not
some tagmanager-wide problem.

It's probably not some TM-wide problem, but I tried to understand what
was the problem, but never got to understand that thing.  And certainly
not how I would add some more precision in the scope search, like same
file, close to the position, etc.

I think if a tag array was sorted by scope first, then by name, then the global name lookup still works as before, but allows O(log n) lookup of all tags with matching scope. So you lookup those scope tags, then you sort the results according to how you like, e.g. line number.

To sort by file I think you would use tmtag->atts.entry.file->work_object.file_name

I finished the global tags merging change:
https://github.com/geany/geany/commit/712cdd6aa0005f65bd77b15c34cd29067864628d


IMO this should make loading global tags fast enough, even for heavy
use, as the whole array is now only traversed once.

That's a really good idea yeah, good job :)

For avoiding resorting of workspace tags when only reparsing one source
object, we can remove the source object's old tags&  merge the new tags.
This is all O(n) for the workspace array. I haven't looked into
implementing this yet because now it's clear you're working on a
competing change.

I'm not sure how it is for now (actually i didn't check just now), but
yes it looks sensible.

I think removing the workspace array is probably better, but haven't investigated completely. I'll reply in the other email.

The other changes I've made were in response to a user's problem with
generating C tag files, and some changes to the CTags files.

Good.  However not stripping G_BEGIN_DECLS confuses the CTags parser and
makes particularly makes it think the first identifier after it has
"G_BEGIN_DECLS" type; which then may cause problems for e.g. scope
completion.

Are you sure? I tested this and it seemed Ok as my change adds those tags to the c_ignore_tags automatically.
_______________________________________________
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel

Reply via email to