On 03/14/2013 01:10 PM, Michael Schroeder wrote:
On Thu, Mar 14, 2013 at 10:55:07AM +0200, Panu Matilainen wrote:
Yup, detecting and automatically regenerating out-of-sync indexes is pretty
much a must (yet something we currently dont have either, sigh)

Some other "issues" in the current implementation AFAICS:
- The ability to grab all keys of an index is missing, which would be
needed for the newish index iterator API. I always had the feeling that API
might come back to bite us at some point...

I already added both rpmidxList() and rpmpkgList() last night. ;)

Ok, good :)

- Index keys are limited to strings whereas we currently have others too,
but then all the actually interesting indexes have string keys, and we
might well be able just to eliminate the others (or convert the data into
strings)

Yes, I noticed that after checking rpm's current database code. I can
easily switch the rpmidx functions to use binary as keys if you like,
it just makes the rpmidxList function a bit awkward as it can no longer
return an array of strings.

I think strings are fine, just thought to note that there are those couple of non-string indexes which we need to do something about. Sigmd5 is probably better just axed, Installtid we might want to keep but that can just as well be converted into a string.

BTW shouldn't those h2be() and be2h() calls be htonl() and ntohl() instead?

Yes, we could use those instead. I just didn't like to include the
"arpa/inet.h" header file, it kinda felt wrong.
There's also htobe32/be32toh in endian.h if we define _BSD_SOURCE; that
seems to be a better choice.
As I wasn't sure what to do I decided to postpone the issue by using
my own inline functions for now ;)

Heh. Including <arpa/inet.h> for non-networking purposes does indeed feel a bit odd, but that's likely the standard and "portably correct" way of doing endian conversions, which at least in glibc are system-optimized as well. <endian.h> is apparently not very standard.

Hmm... rpm seems to include <netinet/in.h> directly, which works with glibc but is not what standards and man pages say about htonl() and friends.


The idea seems to be keeping the database and indexes in big-endian, ie
network byte order (which is good IMO), but currently its unconditionally
byteswapping so big-endian system would have the db's in little endian
format and little endian systems in big endian. Or am I totally missing
something here?

Yes, the code always uses big endian. It doesn't unconditionally swap.
(It also does unaligned reads/writes, but we don't really need that.)

Ok. I'm not having one of my brightest days apparently ;)
Guess I was expecting to see those "on big endian do nothing" ifdef's in there.


Coming back to automatically regenerating of out-of-sync indexes, there's
still another way do the implementation: keep those indexes in memory
and don't store them to disk at all.
This means that the indexes need to be generated on the fly at first
access by reading all header, it thus means we need to additionaly store
a stripped version of each header that just contains the interesting
bits.

Advantages:
- just one single database file
- no out-of-sync indexes possible

Disadvantage:
- needs a bit of time to generate the in-core indexes

For my system (2102 installed rpms) the stripped headers would be
about 2.2 MBytes to read, that takes about .34 seconds with my slow
disk and dropped caches, which is quite noticable.

Yeah, it seems pretty heavy for simple operations. OTOH it wouldn't hurt to have such a mode: for example if we notice indexes are corrupt/out-of-sync but we dont have the permissions to regenerate the on-disk files, it could fall back to in-memory indexes to get correct results even if its slightly slower.

What I've had in mind is lumping all the index stuff (possibly along with actual data for the critical parts) into a single file so there'd be just two files db-related files to worry about. But for now, I'm just happy to have an alternative implementation for the pkgs + index databases to play around with :)

        - Panu -

_______________________________________________
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint

Reply via email to