Michael (2x),

- a huge index (a few million keys) with md5 keys (preferably a hash table)
Should be doable. You say python - have a look at the "hash" operator, which manages hash tables. And look at examples/find.py for lots more info.

- the index points to some data structure with meta info (meta_data)
  (about 10 attributes with 300-500 bytes per row)
2,000,000 x 500 b = 1 Gb

I think you will push MK too far. It mmap's all of the file, on a 32b address space machine that may become an issue.

- most of those data structures point to a ~3kb blob field.
You will have to use a second file, and manage it separately (perhaps with MK, but not being a MK file).

- some of those data structures point to a big file like
  data structure (a blob with varying sizes 20kb-10MB)
Same.

- some data structure to maintain tree and graph like
  structures
Several ways to do that.  If you're talking millions of nodes... ouch.

- Can metakit deal with such data (if 50 GB is to big, I could
  move the blob like data into a flat file)?
MK cannot handle 50 Gb, or 5 Gb. Even 1 Gb is pushing it (but more because of other aspects - such as free space pool size). You will have to look into the hash and blocked viewer operations, without them this is a guaranteed dead end.

As far as i know from talks with JCW last summer metakit doesn't play in the Gigabytes of Data League. So moving the large blobs out would probably help.

- Can I quickly sort on some columns of the meta_data?
Sort / select a million rows?  Ouch.

- Can I store binary data with metakit (null characters in
  strings)?
At least the tcl binding can, so i assume the other bindings can do it as well.
Yes. Declare the fields as type "B".

You have not mentioned the platform/OS. It may sound like heresy, but have you considered something like ReiserFS? It's geared at huge numbers of small (file) objects (I've got two > 50 Gb "datasets", i.e. disks, set up that way). I'd use that for most of the bits, and then see if MK can be used to provide access paths into such a storage system.

Treating MK as a disk replacement is not going to be a fun experience. Oracle might be up to that, but not MK. Let me qualify that as "not yet", btw :)

-jcw

_______________________________________________
metakit mailing list - [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Reply via email to