Btw. one way of index update would be: emerge --sync:
- Add delete index if it exists emerge --searchindexon - Turns index on emerge --searchindexoff - Turns index off - Deletes index if it exists emerge -s or emerge -S - Does normal search if index is off - If index is on and does not exist, then creates index on the fly - If index is on and does exist, then uses it Tambet - technique evolves to art, art evolves to magic, magic evolves to just doing. 2008/12/2 Tambet <[EMAIL PROTECTED]> > About zipping.. Default settings might not really be good idea - i think > that "fastest" might be even better. Considering that portage tree contains > same word again and again (like "applications") it needs pretty small > dictionary to make it much smaller. Decompressing will not be reading from > disc, decompressing and writing back to disc as in your case probably - try > decompression to memory drive and you might get better numbers. > > I have personally used compression in one c++ application and with optimum > settings, it made things much faster - those were files, where i had for > example 65536 16-byte integers, which could be zeros and mostly were; I > didnt care about creating better file format, but just compressed the whole > thing. > > I suggest you to compress esearch db, then decompress it to memory drive > and give us those numbers - might be considerably faster. > > http://www.python.org/doc/2.5.2/lib/module-gzip.html - Python gzip > support. Try open of that and normal open on esearch db; also compress with > the same lib to get right kind of file. > > Anyway - maybe this compression should be later added and optional. > > Tambet - technique evolves to art, art evolves to magic, magic evolves to > just doing. > > > 2008/12/2 Alec Warner <[EMAIL PROTECTED]> > > On Mon, Dec 1, 2008 at 4:20 PM, Tambet <[EMAIL PROTECTED]> wrote: >> > 2008/12/2 Emma Strubell <[EMAIL PROTECTED]> >> >> >> >> True, true. Like I said, I don't really use overlays, so excuse my >> >> igonrance. >> > >> > Do you know an order of doing things: >> > >> > Rules of Optimization: >> > >> > Rule 1: Don't do it. >> > Rule 2 (for experts only): Don't do it yet. >> > >> > What this actually means - functionality comes first. Readability comes >> > next. Optimization comes last. Unless you are creating a fancy 3D engine >> for >> > kung fu game. >> > >> > If you are going to exclude overlays, you are removing functionality - >> and, >> > indeed, absolutely has-to-be-there functionality, because noone would >> > intuitively expect search function to search only one subset of >> packages, >> > however reasonable this subset would be. So, you can't, just can't, add >> this >> > package into portage base - you could write just another external search >> > package for portage. >> > >> > I looked this code a bit and: >> > Portage's "__init__.py" contains comment "# search functionality". After >> > this comment, there is a nice and simple search class. >> > It also contains method "def action_sync(...)", which contains >> > synchronization stuff. >> > >> > Now, search class will be initialized by setting up 3 databases - >> porttree, >> > bintree and vartree, whatever those are. Those will be in self._dbs >> array >> > and porttree will be in self._portdb. >> > >> > It contains some more methods: >> > _findname(...) will return result of self._portdb.findname(...) with >> same >> > parameters or None if it does not exist. >> > Other methods will do similar things - map one or another method. >> > execute will do the real search... >> > Now - "for package in self.portdb.cp_all()" is important here ...it >> > currently loops over whole portage tree. All kinds of matching will be >> done >> > inside. >> > self.portdb obviously points to porttree.py (unless it points to fake >> tree). >> > cp_all will take all porttrees and do simple file search inside. This >> method >> > should contain optional index search. >> > >> > self.porttrees = [self.porttree_root] + \ >> > [os.path.realpath(t) for t in >> self.mysettings["PORTDIR_OVERLAY"].split()] >> > >> > So, self.porttrees contains list of trees - first of them is root, >> others >> > are overlays. >> > >> > Now, what you have to do will not be harder just because of having >> overlay >> > search, too. >> > >> > You have to create method def cp_index(self), which will return >> dictionary >> > containing package names as keys. For oroot... will be >> "self.porttrees[1:]", >> > not "self.porttrees" - this will only search overlays. d = {} will be >> > replaced with d = self.cp_index(). If index is not there, old version >> will >> > be used (thus, you have to make internal porttrees variable, which >> contains >> > all or all except first). >> > >> > Other methods used by search are xmatch and aux_get - first used several >> > times and last one used to get description. You have to cache results of >> > those specific queries and make them use your cache - as you can see, >> those >> > parts of portage are already able to use overlays. Thus, you have to put >> > your code again in beginning of those functions - create index_xmatch >> and >> > index_aux_get methods, then make those methods use them and return their >> > results unless those are None (or something other in case none is >> already >> > legal result) - if they return None, old code will be run and do it's >> job. >> > If index is not created, result is None. In index_** methods, just check >> if >> > query is what you can answer and if it is, then answer it. >> > >> > Obviously, the simplest way to create your index is to delete index, >> then >> > use those same methods to query for all nessecary information - and >> fastest >> > way would be to add updating index directly into sync, which you could >> do >> > later. >> > >> > Please, also, make those commands to turn index on and off (last one >> should >> > also delete it to save disk space). Default should be off until it's >> fast, >> > small and reliable. Also notice that if index is kept on hard drive, it >> > might be faster if it's compressed (gz, for example) - decompressing >> takes >> > less time and more processing power than reading it fully out. >> >> I'm pretty sure your mistaken here, unless your index is stored on a >> floppy or something really slow. >> >> A disk read has 2 primary costs. >> >> Seek Time: Time for the head to seek to the sector of disk you want. >> Spin Time: Time for the platter to spin around such that the sector >> you want is under the read head. >> >> Spin Time is based on rpm, so average 7200 rpm / 60 seconds = 120 >> rotations per second, so worst case (you just passed the sector you >> need) you need to wait 1/120th of a second (or 8ms). >> >> Seek Time is per hard drive, but most drives provide average seek >> times under 10ms. >> >> So it takes on average 18ms to get to your data, then you start >> reading. The index will not be that large (my esearchdb is 2 megs, >> but lets assume 10MB for this compressed index). >> >> I took a 10MB meg sqlite database and compressed it with gzip (default >> settings) down to 5 megs. >> gzip -d on the database takes 300ms, catting the decompressed data >> base takes 88ms (average of 5 runs, drop disk caches between runs). >> >> I then tried my vdb_metadata.pickle from >> /var/cache/edb/vdb_metadata.pickle >> >> 1.3megs compresses to 390k. >> >> 36ms to decompress the 390k file, but 26ms to read the 1.3meg file from >> disk. >> >> Your index would have to be very large or very fragmented on disk >> (requiring more than one seek) to see a significant gain in file >> compression (gzip scales linearly). >> >> In short, don't compress the index ;p >> >> > >> > Have luck! >> > >> >>> -----BEGIN PGP SIGNED MESSAGE----- >> >>> Hash: SHA1 >> >>> >> >>> Emma Strubell schrieb: >> >>> > 2) does anyone really need to search an overlay anyway? >> >>> >> >>> Of course. Take large (semi-)official overlays like sunrise. They can >> >>> easily be seen as a second portage tree. >> >>> -----BEGIN PGP SIGNATURE----- >> >>> Version: GnuPG v2.0.9 (GNU/Linux) >> >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >>> >> >>> iEYEARECAAYFAkk0YpEACgkQ4UOg/zhYFuD3jQCdG/ChDmyOncpgUKeMuqDxD1Tt >> >>> 0mwAn2FXskdEAyFlmE8shUJy7WlhHr4S >> >>> =+lCO >> >>> -----END PGP SIGNATURE----- >> >>> >> >> On Mon, Dec 1, 2008 at 5:17 PM, René 'Necoro' Neumann <[EMAIL PROTECTED] >> > >> >> wrote: >> >> >> > >> > >> > >