Fyi, Freenet is using an ancient version of SnakeYAML.

Current version is 1.12
http://code.google.com/p/snakeyaml/wiki/changes

It might be worth looking into upgrading.

Sent from my wireless phone.
On Jul 20, 2013 6:51 AM, "Matthew Toseland" <toad at amphibian.dyndns.org>
wrote:

> On Saturday 20 Jul 2013 09:47:06 Veronica Estrada wrote:
> > Hi Toad,
> >
> > Could you elaborate more on the actual index tree and on the notation
> > you use (for instance I guess for the chat with others that node could
> > mean different things). This documentation should be clear and we are
> > wasting lost of time because it is not.
>
> I've CC'ed devl for this. We should clean it up and put it into a text
> file in Library though, once it's all clear.
> >
> > Some questions are a bit confusing or repeated, but try to answer all
> > of them independently that will help to make the documentation clear.
> >
> >
> > A) Considering:
> >
> >
> http://127.0.0.1:8888/freenet:USK at 
> 2BYYFG4C1kJIRsiW9GhdlYMx52tQ06LXJuoC1TjX-EE,-fd9-wyD1WpfHPuNOMi4O8XhD9z78Dwu0uYO8TQy1FU,AQACAAE/index.yml/1?type=text/plain
> >
> >
> > 1)What is node_min? node_min=1024
> >
> > Is it the size of a node and when it reaches that size, it creates
> another node?
>
> It's the minimum size of a node in the btree. All nodes other than the
> root will be between node_min and node_max in size. node_max is 2*node_min.
> >
> > 2) What is a bin? Each entry line is a bin? I mean, each term is a bin?
>
> A bin is a separate insert which contains the detailed information
> (TermPageEntry's) for multiple terms.
> >
> > 3) All bins with the same URI (same id) make one node?
>
> No, bins are separate from nodes. Bins contain the detailed information
> for the TermPageEntry's associated with that node (if the node has
> children, these terms are the keys between the lower nodes).
>
> Within a term within a bin, each TermPageEntry has a relevance level, and
> if the term is enormous, we can split it up into sub-nodes. I.e. each term
> within the bin is actually a tree by relevance of TermPageEntry's. However
> most of the time the terms are small enough that we don't actually insert
> sub-nodes. So a bin is a separately inserted container consisting of a set
> of terms (keys, keywords) and for each one a tree by relevance of
> TermPageEntry's that match that term.
> >
> > 4) What do you call the top level tree, all the content in the file is
> > the top level tree? I mean, when you talk about "level" I am not sure
> > if you are talking about levels on the tree itself, root, 1 level, 2nd
> > level or you are talking about "nested trees". It will help if you can
> > also make a sketch draw. I can later clean the image, digitalise and
> > create a document that explain all this.
>
> The main tree:
> The USK. This is just a redirect.
> The top level node of the ttab tree.
> The bottom level node of the ttab tree. (The tree can have lots of levels)
>
> From each node within the main tree: (Because this is a btree not a b*tree
> as it should be):
> At least 1 bin.
> Within each bin: Term (keyword) -> relevance -> TermPageEntry
> (Most complex case has an extra level: Huge number of hits -> sub-nodes
> within a term, tree of relevance -> TermPageEntry)
> >
> > 4) The ttab (term table) structure maps a term to (the collection of
> > entries for that term)
>
> Right. Overall it creates a tree of:
> term (string) -> (relevance (float) -> TermPageEntry (detail for one hit))
> >
> > 4.1 Why ttab is empty in the index?
>
> It isn't. YAML uses indenting:
>
> ttab:
>   node_min: 1024
>   size: 240701
>   entries:
>     0axdu7jlnmmer9ucegf: !BinInfo {&id001 !FreenetURI
> 'CHK at 
> jc6tojQ3lGVpFcV90VfQDEQmdvgEcFrgzB9wtMZuL44,K9f8POu6NFaf~aZTvPrH1fkFwa6Vt2ZwZTAEv3zyEvY,AAMC--8':
> 1}
>     '1249': !BinInfo {*id001: 6}
> ...
> >
> > 4.2 Do I need ttab?
>
> Yes.
> >
> >
> > 5) Subnodes
> >
> >
> > 5.1 All entries listed below "subnodes" are pointers to different
> > trees or they are just a children of the entries listed above?
>
> They are nodes. A node is a tree by definition. But they are not the root
> node.
> >
> > 5.2 The first submode line is
> >
> >     !FreenetURI
> > 'CHK at CgUaGHkmu73R84Ip6BhHPOZXdH4eBe57l6G0E
> ~vVSFg,x8W8p~lHqhFt8UbjaGpIEBVpmbnVSOpOUiTxQXcRkLc,AAMC--8':
> > 1486
> >
> > What is this 1486? This number is repeated in other lines. Is it an
> > ID? Are lines with such number connected somehow? Or is the ID 148 and
> > the last number indicates something? All lines are 148 something.
>
> I *think* it's the number of elements in the subtree. If so the tree is
> impressively well balanced. It's certainly not an ID.
> >
> > 5.3 Each line is a subnode?
>
> Yes.
> >
> > 5.4 Subnodes of which tree, the top level tree?
>
> Subnodes of the node you are currently looking at. Which sit between the
> entries.
> >
> >
> > 6) Nested B-trees: Each FreenetURI in a subnode takes me to a
> > different nested B-tree?
>
> No, each term in a bin takes you to a different nested B-tree. However
> most of the time they are trivial (i.e. only have one node, which is
> included in the bin).
> >
> >
> >
> ***************************************************************************************************************************************************
> >
> > B) Considering subnodes (and visiting them):
> >
> >
> > 1) Subnodes name suggest a node of the lower level on the tree.
> > However I am not sure about the lkey, and rkey. The subnode is
> > pointing to two parent nodes of the top level tree?
>
> The lkey and rkey are included explicitly in the file (below the root
> node):
> lkey: null
> rkey: 0axdu7jlnmmer9ucegf
>
> This subnode is a child of the root node, so has one parent.
> >
> > 2) I visited the first subnode of the previous structure (A):
> >
> >
> http://localhost:8888/freenet:CHK at 
> CgUaGHkmu73R84Ip6BhHPOZXdH4eBe57l6G0E~vVSFg,x8W8p~lHqhFt8UbjaGpIEBVpmbnVSOpOUiTxQXcRkLc,AAMC--8?type=text/plain
> >
> >
> > 2.1) Entries are binary metadata. How can I decoded it? What is inside?
> >
> > 3) All entries have the same bin id. How do you call the entries, bin,
> > node? Please answer A.2 and A.3 applied to this entries.
>
> The entries are divided into bins, which are referred to by ID.
>
> The first time a bin ID is mentioned, we include its location on Freenet.
> This can be either:
> 1) A FreenetURI (to fetch)
> 2) Freenet-level binary metadata (slightly more complicated to fetch, but
> more robust as it can include multiple keys with redundancy, so it won't be
> lost if one key is lost)
> >
> > 4) The index generated with Tester.java uses binary data too. Then the
> > top level tree has FreenetURI and the second contains binary data.
> > Tester.java is showing me the nested tree?
>
> This index?
>
> http://localhost:8888/freenet:CHK at 
> p73nChsN9wCKImsczgGOrjBf~dTz-17CCYougnislDs,uA4D92YhUV7ZGA71tGTpSEPnI4XGVqQES7sVf7i3wnc,AAMC--8?type=text/plain
>
> AFAICS this is the same format as the Spider indexes, it's just smaller:
> It uses ProtoIndex():
>         public ProtoIndex(FreenetURI id, String n, String owner, String
> ownerEmail, long pages) {
>                 this(id, n, owner, ownerEmail, pages, new Date(), new
> HashMap<String, Object>(),
>                         new SkeletonBTreeMap<URIKey,
> SkeletonBTreeMap<FreenetURI, URIEntry>>(BTREE_NODE_MIN),
>                         new SkeletonBTreeMap<String,
> SkeletonBTreeSet<TermEntry>>(BTREE_NODE_MIN)/*,
>                         //filtab = new SkeletonPrefixTreeMap<Token,
> TokenFilter>(new Token(), TKTAB_MAX)*/
>                 );
>         }
>
> So again we're only using ttab here.
> >
> > Best,
> > Veronica
>
> _______________________________________________
> Devl mailing list
> Devl at freenetproject.org
> https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
>

Reply via email to