Re: [gentoo-portage-dev] search functionality in emerge

tvali Sun, 23 Nov 2008 12:00:34 -0800

Yes if it would be a low-level implementation to portage, speeding up it's
native code for searching and using indexes, then it would make everything
faster, including emerge (because emerge does search first for package
relations). I have actually wanted to do it myself several years ago, so
reacting here to have my ideas discussed, too.


Douglas Anderson 16:46 reply is about locks and I think that it would need
to rethink portages locking methods - what, when and why it locks. This is
probably quite hard task by itself. Anyway, as portage actually lets user
make two emerges at the same time, locks might be OK, too.

I think that the best thing would be bottom-up refactoring - first to make a
list of lowest-level functions, which have to do with reading data from
portage tree or writing into it; then making indexer class, which will be
used by all of those low-level functions.

To have it OOP, it should be implemented in such way:

   - Low-level portage tree handler does everything with portage tree, no
   function in portage uses it directly.
   - Tree handler has several needed and several optional methods - so that
   implementing new handler would be easy, but creating things like native
   regex search would be possible.
   - One could implement a new tree handler with SQLite or other interface
   instead of filesystem and do other tricks through this interface; for
   example, boost it.

So, nice way to go would be:

   1. Implementing portage tree handler and it's proxy, which uses current
   portage tree in non-indexed way and simply gives methods for the same kind
   of access, as currently implemented one.
   2. Refactoring portage to rely only on portage tree handler and use
   direct portage tree nowhere. To test if it is so, portage tree should be
   moved to another directory, about which only this handler knows, and check
   if portage works well. Indexing all places, where portage uses it's tree
   handler (by std. comment, for example) and making clear, which methods would
   contain all boostable code of it.
   3. Implementing those methods in proxy, which could simulate fast regex
   search and other stuff using simplest possible interface of portage tree
   handler (smth. like four methods add, remove, get, list). Proxy should be
   able to use handler's methods if they are implemented.
   4. Refactoring portage to use advanced methods in proxy.
   5. Now, having taken all code together into one place and looking this
   nice and readable code, real optimizations could be discussed here, for
   example. Ideally, i think, portage would have such tree handlers:
      - Filesystem handler - fast searches over current portage tree
      structure.
      - SQL handler - rewrite of tree functions into SQL queries.
      - Network-based handler - maybe sometimes it would nice to have
      portage tree only in one machine of cluster, for example if I
want to have
      100 really small computers with fast connection with mother-computer and
      portage tree is too big to be wholly copied to all of them.
      - Memory-buffered handler with daemon, which is actually proxy to some
      other handler - daemon, which reads whole tree (from filesystem
or SQL) into
      memory on boot or first use, creates really fast index (because
now it does
      matter to have better indexing) and optionally deletes some [less needed]
      parts of it's index from memory when it's becoming full and behaves as
      really simple proxy if it stays full. This should be implemented after
      critical parts of filesystem or SQL handler.

2008/11/23 Emma Strubell <[EMAIL PROTECTED]>

> Wow, that's extremely helpful!! I happen to particularly enjoy tries, so
> the suffix trie sounds like a great idea. The trie class example is really
> helpful too, because this will be my first time programming in Python, and
> it's a bit easier to figure out what's going on syntax-wise in that simple
> trie class than in the middle of the portage source code!
>
> Seriously, thanks again :]
>
>
> On Sun, Nov 23, 2008 at 11:56 AM, Lucian Poston <[EMAIL PROTECTED]>wrote:
>
>> > Thanks for the replies! I know there are a couple programs out there
>> that
>> > basically already do what I'm looking to do... Unfortunately I wasn't
>> aware
>> > of these pre-existing utilities until after I submitted my project
>> proposal
>> > to my professor. So, I'm looking to implement a better search myself.
>> > Preferably by editing the existing portage code, not writing a separate
>> > program. So if anyone can offer any help regarding the actual
>> implementation
>> > of search in portage, I would greatly appreciate it!
>>
>> Most of the search implementation is in
>> /usr/lib/portage/pym/_emerge/__init__.py in class search.  The class's
>> execute() method simply iterates over all packages (and descriptions
>> and package sets) and matches against the searchkey.  You might need
>> to look into pym/portage/dbapi/porttree.py for portdbapi as well.
>>
>> If you intend to index and support fast regex lookup, then you need to
>> do some fancy indexing, which I'm not terribly familiar with.  You
>> could follow in the steps of eix[1] or other indexed search utilities
>> and design some sort of index layout, which is easier than the
>> following thought.  You might consider implementing a suffix trie or
>> similar that has sublinear regexp lookup and marshalling the structure
>> for the index.  I couldn't find a python implementation for something
>> like this, but here is a general trie class[2] that you might start
>> with if you go that route.  There is a perl module[3],
>> Tie::Hash::Regex, that does that, but properly implementing that in
>> python would be a chore. :)
>>
>> That project sounds interesting and fun. Good luck!
>>
>> Lucian Poston
>>
>> [1] https://projects.gentooexperimental.org/eix/wiki/IndexFileLayout
>> [2]
>> http://www.koders.com/python/fid7B6BC1651A9E8BBA547552FE3F039479A4DECC45.aspx
>> [3]
>> http://search.cpan.org/~davecross/Tie-Hash-Regex-1.02/lib/Tie/Hash/Regex.pm<http://search.cpan.org/%7Edavecross/Tie-Hash-Regex-1.02/lib/Tie/Hash/Regex.pm>
>>
>>
>


-- 
tvali

Kuskilt foorumist: http://www.cooltests.com - kui inglise keelt oskad.
Muide, üle 120 oled väga tark, üle 140 oled geenius, mingi 170 oled ju mingi
täica pea nagu prügikast...

Re: [gentoo-portage-dev] search functionality in emerge

Reply via email to