On Saturday 20 Jul 2013 09:47:06 Veronica Estrada wrote:
> Hi Toad,
> 
> Could you elaborate more on the actual index tree and on the notation
> you use (for instance I guess for the chat with others that node could
> mean different things). This documentation should be clear and we are
> wasting lost of time because it is not.

I've CC'ed devl for this. We should clean it up and put it into a text file in 
Library though, once it's all clear.
> 
> Some questions are a bit confusing or repeated, but try to answer all
> of them independently that will help to make the documentation clear.
> 
> 
> A) Considering:
> 
> http://127.0.0.1:8888/freenet:USK@2BYYFG4C1kJIRsiW9GhdlYMx52tQ06LXJuoC1TjX-EE,-fd9-wyD1WpfHPuNOMi4O8XhD9z78Dwu0uYO8TQy1FU,AQACAAE/index.yml/1?type=text/plain
> 
> 
> 1)What is node_min? node_min=1024
> 
> Is it the size of a node and when it reaches that size, it creates another 
> node?

It's the minimum size of a node in the btree. All nodes other than the root 
will be between node_min and node_max in size. node_max is 2*node_min.
> 
> 2) What is a bin? Each entry line is a bin? I mean, each term is a bin?

A bin is a separate insert which contains the detailed information 
(TermPageEntry's) for multiple terms. 
> 
> 3) All bins with the same URI (same id) make one node?

No, bins are separate from nodes. Bins contain the detailed information for the 
TermPageEntry's associated with that node (if the node has children, these 
terms are the keys between the lower nodes).

Within a term within a bin, each TermPageEntry has a relevance level, and if 
the term is enormous, we can split it up into sub-nodes. I.e. each term within 
the bin is actually a tree by relevance of TermPageEntry's. However most of the 
time the terms are small enough that we don't actually insert sub-nodes. So a 
bin is a separately inserted container consisting of a set of terms (keys, 
keywords) and for each one a tree by relevance of TermPageEntry's that match 
that term.
> 
> 4) What do you call the top level tree, all the content in the file is
> the top level tree? I mean, when you talk about "level" I am not sure
> if you are talking about levels on the tree itself, root, 1 level, 2nd
> level or you are talking about "nested trees". It will help if you can
> also make a sketch draw. I can later clean the image, digitalise and
> create a document that explain all this.

The main tree:
The USK. This is just a redirect.
The top level node of the ttab tree.
The bottom level node of the ttab tree. (The tree can have lots of levels)

From each node within the main tree: (Because this is a btree not a b*tree as 
it should be):
At least 1 bin.
Within each bin: Term (keyword) -> relevance -> TermPageEntry
(Most complex case has an extra level: Huge number of hits -> sub-nodes within 
a term, tree of relevance -> TermPageEntry)
> 
> 4) The ttab (term table) structure maps a term to (the collection of
> entries for that term)

Right. Overall it creates a tree of:
term (string) -> (relevance (float) -> TermPageEntry (detail for one hit))
> 
> 4.1 Why ttab is empty in the index?

It isn't. YAML uses indenting:

ttab:
  node_min: 1024
  size: 240701
  entries:
    0axdu7jlnmmer9ucegf: !BinInfo {&id001 !FreenetURI 
'CHK@jc6tojQ3lGVpFcV90VfQDEQmdvgEcFrgzB9wtMZuL44,K9f8POu6NFaf~aZTvPrH1fkFwa6Vt2ZwZTAEv3zyEvY,AAMC--8':
 1}
    '1249': !BinInfo {*id001: 6}
...
> 
> 4.2 Do I need ttab?

Yes.
> 
> 
> 5) Subnodes
> 
> 
> 5.1 All entries listed below "subnodes" are pointers to different
> trees or they are just a children of the entries listed above?

They are nodes. A node is a tree by definition. But they are not the root node.
> 
> 5.2 The first submode line is
> 
>     !FreenetURI
> 'CHK@CgUaGHkmu73R84Ip6BhHPOZXdH4eBe57l6G0E~vVSFg,x8W8p~lHqhFt8UbjaGpIEBVpmbnVSOpOUiTxQXcRkLc,AAMC--8':
> 1486
> 
> What is this 1486? This number is repeated in other lines. Is it an
> ID? Are lines with such number connected somehow? Or is the ID 148 and
> the last number indicates something? All lines are 148 something.

I *think* it's the number of elements in the subtree. If so the tree is 
impressively well balanced. It's certainly not an ID.
> 
> 5.3 Each line is a subnode?

Yes.
> 
> 5.4 Subnodes of which tree, the top level tree?

Subnodes of the node you are currently looking at. Which sit between the 
entries.
> 
> 
> 6) Nested B-trees: Each FreenetURI in a subnode takes me to a
> different nested B-tree?

No, each term in a bin takes you to a different nested B-tree. However most of 
the time they are trivial (i.e. only have one node, which is included in the 
bin).
> 
> 
> ***************************************************************************************************************************************************
> 
> B) Considering subnodes (and visiting them):
> 
> 
> 1) Subnodes name suggest a node of the lower level on the tree.
> However I am not sure about the lkey, and rkey. The subnode is
> pointing to two parent nodes of the top level tree?

The lkey and rkey are included explicitly in the file (below the root node):
lkey: null
rkey: 0axdu7jlnmmer9ucegf

This subnode is a child of the root node, so has one parent.
> 
> 2) I visited the first subnode of the previous structure (A):
> 
> http://localhost:8888/freenet:CHK@CgUaGHkmu73R84Ip6BhHPOZXdH4eBe57l6G0E~vVSFg,x8W8p~lHqhFt8UbjaGpIEBVpmbnVSOpOUiTxQXcRkLc,AAMC--8?type=text/plain
> 
> 
> 2.1) Entries are binary metadata. How can I decoded it? What is inside?
> 
> 3) All entries have the same bin id. How do you call the entries, bin,
> node? Please answer A.2 and A.3 applied to this entries.

The entries are divided into bins, which are referred to by ID.

The first time a bin ID is mentioned, we include its location on Freenet. This 
can be either:
1) A FreenetURI (to fetch)
2) Freenet-level binary metadata (slightly more complicated to fetch, but more 
robust as it can include multiple keys with redundancy, so it won't be lost if 
one key is lost)
> 
> 4) The index generated with Tester.java uses binary data too. Then the
> top level tree has FreenetURI and the second contains binary data.
> Tester.java is showing me the nested tree?

This index?
http://localhost:8888/freenet:CHK@p73nChsN9wCKImsczgGOrjBf~dTz-17CCYougnislDs,uA4D92YhUV7ZGA71tGTpSEPnI4XGVqQES7sVf7i3wnc,AAMC--8?type=text/plain

AFAICS this is the same format as the Spider indexes, it's just smaller: It 
uses ProtoIndex():
        public ProtoIndex(FreenetURI id, String n, String owner, String 
ownerEmail, long pages) {
                this(id, n, owner, ownerEmail, pages, new Date(), new 
HashMap<String, Object>(),
                        new SkeletonBTreeMap<URIKey, 
SkeletonBTreeMap<FreenetURI, URIEntry>>(BTREE_NODE_MIN),
                        new SkeletonBTreeMap<String, 
SkeletonBTreeSet<TermEntry>>(BTREE_NODE_MIN)/*,
                        //filtab = new SkeletonPrefixTreeMap<Token, 
TokenFilter>(new Token(), TKTAB_MAX)*/
                );
        }

So again we're only using ttab here.
> 
> Best,
> Veronica

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to