Moving this back to pkg-discuss.

Michal Pryc wrote:
Brock,
I complately agree with you that currently it's quite hard to find package across repositories, but probably you remember that people were telling that showing all packages from all repositories will not scale, that is why we took another approach, which I believe you will find fine. This involves a little more changes, which I didn't want to inroduce at once, maybe I should have been more precise about planned work.
Right, I agree that showing all packages at once (by default) isn't the right approach. I don't think I agree that splitting by repository is the right answer though. Especially with the pending change to the category system and package namespace, it seems to me that mirroring that would be the right interface, essentially replacing the current category dropbox and pane, and the list view with a tree structure that would allow a user to drill down through several levels of categories quickly. I'm fine with having a repository drop box to select repos, but the default for that box should be "all". If I'm looking for a package, I'd expect choosing which repository to be the lowest level in the hierarchy, not the top level.

So currently we have decided to:

 - split the data models per authority/repository. This will improve
performance of refilter/search functions and switching between categories as well this simplifies a little bit filter functions. The problem starts when users are adding blastwave, pending, dev, sunfreeware and contrib repos, our data model contains all packages and GUI search functionality is bad.
I agree that the complete-as-you-type search can't scale as currently created. I don't agree that this solution is anything more than a patch that will tide us over for a little while, and I think making a significant shift in how you organize the data for a short term fix is a questionable decision. Release currently has 20k packages, dev has 25k, and pending has 11k. For those repos, the problem has been roughly reduced by a factor of three. (really more like factors of 2, 2, and 5). Anyway, I'm not sure that that feature is worth reworking the organizational structure of your code. Those repos will grow, so unless I've missed something, we've just pushed the day of reckoning off for a few months (or years depending on their growth factors). It's quite possible that using the filters from gtk.ListStore isn't going to scale going forward. I'm not familiar with the code backing that function, nor am I certain that that's where the slow down is, but I think I remember it being implemented.

If you really want search as you type, backing the filter/search with a tree based, rather than linear search (just my guess as to how that's implemented given the scaling concerns described) might solve the problems in a way that would scale going forward. Each node could even know how many children live beneath it and decide whether or not attempting to populate all of the results at once makes sense. Fun games could also be played by combining a search which was a generator with a list to store the results already seen, to allow the search results to stream across, rather than waiting for all results to come in before displaying them.


- add cpickle caching to the data models. This will simply dump the data structures from the gtk.ListStore and read those on startup making sure that cache is in sync with current catalog and package states, if not then the list will be gathered as it is now. The cache will be per repository, so switching between those will read corresponding caches. I've already working version of this part, where gui starts in ~3 seconds for the dev repository comparing to 24seconds (currently)
This makes perfect sense to me, except I'd pickle the info for all the repos, or pickle by category instead of by repo.

- in the all repository view, rather then showing all packages, we will perform remote search. This will allow users to find desired packages and will be scalable at the same time (please scroll down to see *all repositories*): http://xdesign.sfbay.sun.com/projects/solaris/subprojects/package_mngt/UI_specs/ui_spec_phase3/html-mockup/12_search_parameters_r4.htm

What if I just want to browse packages from all repos? I do a search for * or '' and then look at the search results window? I agree, you should be able to do a remote search over all packages. I don't think that it's in anyway intuitive that that means an entirely different search algorithm will now be used. For example if I search for 'foo' in search all repos and I find out that that maps to "SUNWbar". I would then expect that if I do a search on the repos individually, I would find at least one repo where the same result happened for the same search, but unless I'm confused about how your search is planned to work, that won't happen because searching a single repo employs an entirely separate algorithm.

While we're talking about search, I'll mention that you might want to start thinking about what an advanced search tool might look like in the GUI. If you want to make simplifying assumptions about what the user's looking for when using the built in search bar, that's fine, but having a more robust tool so that users can execute the same queries they could from the command line would probably be a nice feature.

Brock
best
Michal

[snip]

_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to