Fyi, Freenet is using an ancient version of SnakeYAML. Current version is 1.12 http://code.google.com/p/snakeyaml/wiki/changes
It might be worth looking into upgrading. Sent from my wireless phone. On Jul 20, 2013 6:51 AM, "Matthew Toseland" <toad at amphibian.dyndns.org> wrote: > On Saturday 20 Jul 2013 09:47:06 Veronica Estrada wrote: > > Hi Toad, > > > > Could you elaborate more on the actual index tree and on the notation > > you use (for instance I guess for the chat with others that node could > > mean different things). This documentation should be clear and we are > > wasting lost of time because it is not. > > I've CC'ed devl for this. We should clean it up and put it into a text > file in Library though, once it's all clear. > > > > Some questions are a bit confusing or repeated, but try to answer all > > of them independently that will help to make the documentation clear. > > > > > > A) Considering: > > > > > http://127.0.0.1:8888/freenet:USK at > 2BYYFG4C1kJIRsiW9GhdlYMx52tQ06LXJuoC1TjX-EE,-fd9-wyD1WpfHPuNOMi4O8XhD9z78Dwu0uYO8TQy1FU,AQACAAE/index.yml/1?type=text/plain > > > > > > 1)What is node_min? node_min=1024 > > > > Is it the size of a node and when it reaches that size, it creates > another node? > > It's the minimum size of a node in the btree. All nodes other than the > root will be between node_min and node_max in size. node_max is 2*node_min. > > > > 2) What is a bin? Each entry line is a bin? I mean, each term is a bin? > > A bin is a separate insert which contains the detailed information > (TermPageEntry's) for multiple terms. > > > > 3) All bins with the same URI (same id) make one node? > > No, bins are separate from nodes. Bins contain the detailed information > for the TermPageEntry's associated with that node (if the node has > children, these terms are the keys between the lower nodes). > > Within a term within a bin, each TermPageEntry has a relevance level, and > if the term is enormous, we can split it up into sub-nodes. I.e. each term > within the bin is actually a tree by relevance of TermPageEntry's. However > most of the time the terms are small enough that we don't actually insert > sub-nodes. So a bin is a separately inserted container consisting of a set > of terms (keys, keywords) and for each one a tree by relevance of > TermPageEntry's that match that term. > > > > 4) What do you call the top level tree, all the content in the file is > > the top level tree? I mean, when you talk about "level" I am not sure > > if you are talking about levels on the tree itself, root, 1 level, 2nd > > level or you are talking about "nested trees". It will help if you can > > also make a sketch draw. I can later clean the image, digitalise and > > create a document that explain all this. > > The main tree: > The USK. This is just a redirect. > The top level node of the ttab tree. > The bottom level node of the ttab tree. (The tree can have lots of levels) > > From each node within the main tree: (Because this is a btree not a b*tree > as it should be): > At least 1 bin. > Within each bin: Term (keyword) -> relevance -> TermPageEntry > (Most complex case has an extra level: Huge number of hits -> sub-nodes > within a term, tree of relevance -> TermPageEntry) > > > > 4) The ttab (term table) structure maps a term to (the collection of > > entries for that term) > > Right. Overall it creates a tree of: > term (string) -> (relevance (float) -> TermPageEntry (detail for one hit)) > > > > 4.1 Why ttab is empty in the index? > > It isn't. YAML uses indenting: > > ttab: > node_min: 1024 > size: 240701 > entries: > 0axdu7jlnmmer9ucegf: !BinInfo {&id001 !FreenetURI > 'CHK at > jc6tojQ3lGVpFcV90VfQDEQmdvgEcFrgzB9wtMZuL44,K9f8POu6NFaf~aZTvPrH1fkFwa6Vt2ZwZTAEv3zyEvY,AAMC--8': > 1} > '1249': !BinInfo {*id001: 6} > ... > > > > 4.2 Do I need ttab? > > Yes. > > > > > > 5) Subnodes > > > > > > 5.1 All entries listed below "subnodes" are pointers to different > > trees or they are just a children of the entries listed above? > > They are nodes. A node is a tree by definition. But they are not the root > node. > > > > 5.2 The first submode line is > > > > !FreenetURI > > 'CHK at CgUaGHkmu73R84Ip6BhHPOZXdH4eBe57l6G0E > ~vVSFg,x8W8p~lHqhFt8UbjaGpIEBVpmbnVSOpOUiTxQXcRkLc,AAMC--8': > > 1486 > > > > What is this 1486? This number is repeated in other lines. Is it an > > ID? Are lines with such number connected somehow? Or is the ID 148 and > > the last number indicates something? All lines are 148 something. > > I *think* it's the number of elements in the subtree. If so the tree is > impressively well balanced. It's certainly not an ID. > > > > 5.3 Each line is a subnode? > > Yes. > > > > 5.4 Subnodes of which tree, the top level tree? > > Subnodes of the node you are currently looking at. Which sit between the > entries. > > > > > > 6) Nested B-trees: Each FreenetURI in a subnode takes me to a > > different nested B-tree? > > No, each term in a bin takes you to a different nested B-tree. However > most of the time they are trivial (i.e. only have one node, which is > included in the bin). > > > > > > > *************************************************************************************************************************************************** > > > > B) Considering subnodes (and visiting them): > > > > > > 1) Subnodes name suggest a node of the lower level on the tree. > > However I am not sure about the lkey, and rkey. The subnode is > > pointing to two parent nodes of the top level tree? > > The lkey and rkey are included explicitly in the file (below the root > node): > lkey: null > rkey: 0axdu7jlnmmer9ucegf > > This subnode is a child of the root node, so has one parent. > > > > 2) I visited the first subnode of the previous structure (A): > > > > > http://localhost:8888/freenet:CHK at > CgUaGHkmu73R84Ip6BhHPOZXdH4eBe57l6G0E~vVSFg,x8W8p~lHqhFt8UbjaGpIEBVpmbnVSOpOUiTxQXcRkLc,AAMC--8?type=text/plain > > > > > > 2.1) Entries are binary metadata. How can I decoded it? What is inside? > > > > 3) All entries have the same bin id. How do you call the entries, bin, > > node? Please answer A.2 and A.3 applied to this entries. > > The entries are divided into bins, which are referred to by ID. > > The first time a bin ID is mentioned, we include its location on Freenet. > This can be either: > 1) A FreenetURI (to fetch) > 2) Freenet-level binary metadata (slightly more complicated to fetch, but > more robust as it can include multiple keys with redundancy, so it won't be > lost if one key is lost) > > > > 4) The index generated with Tester.java uses binary data too. Then the > > top level tree has FreenetURI and the second contains binary data. > > Tester.java is showing me the nested tree? > > This index? > > http://localhost:8888/freenet:CHK at > p73nChsN9wCKImsczgGOrjBf~dTz-17CCYougnislDs,uA4D92YhUV7ZGA71tGTpSEPnI4XGVqQES7sVf7i3wnc,AAMC--8?type=text/plain > > AFAICS this is the same format as the Spider indexes, it's just smaller: > It uses ProtoIndex(): > public ProtoIndex(FreenetURI id, String n, String owner, String > ownerEmail, long pages) { > this(id, n, owner, ownerEmail, pages, new Date(), new > HashMap<String, Object>(), > new SkeletonBTreeMap<URIKey, > SkeletonBTreeMap<FreenetURI, URIEntry>>(BTREE_NODE_MIN), > new SkeletonBTreeMap<String, > SkeletonBTreeSet<TermEntry>>(BTREE_NODE_MIN)/*, > //filtab = new SkeletonPrefixTreeMap<Token, > TokenFilter>(new Token(), TKTAB_MAX)*/ > ); > } > > So again we're only using ttab here. > > > > Best, > > Veronica > > _______________________________________________ > Devl mailing list > Devl at freenetproject.org > https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl >
