Re: [freenet-dev] questions about Library for my GSoC project

Ximin Luo Tue, 21 May 2013 13:47:55 -0700

On 21/05/13 17:11, leuchtkaefer wrote:
> Hi Ximin,
> Thanks for your answer. I will rephrase what you wrote (adding my own view) 
> to check if I understand.
> 
> 
> I understand that each node constructs an SkeletonBTreeMap (huge index) that 
> in the long-term will contain a huge index with all the successful searches 
> initiated by that node or that passed though that node. Using the 
> SkeletonBTreeMap, each node has a partial view of the documents stored in 
> freenet system, but only about documents that passed thought that node.
> 
> The paragraph starting by For "Library's B-Tree, this is not feasible" is 
> hard to understand. 
> 
> Since datastore uses a LRU-like cache replacement, when a node's datastore 
> size is exceeded old files are deleted. This should be reflected in the huge 
> index maintain by all remote's node that have links to that recently deleted 
> file. But it is not possible to reflect it, nodes don't know when the 
> links/items in their index are not valid anymore (i.e., when a remote node 
> deleted the file as part of LRU police replacement).
>


It sounds like you're confusing the Library index with the freenet datastore. 
The former is like a filesystem abstraction, the latter is like a low-level 
block device. There is no feasible way for the former to control the operation 
of the latter, and it would be undesirable in any case as layer violation.

I'm also unconvinced that it would be desirable for nodes to tell other nodes 
about LRU drops, due to the potential leak of information. Do you have an 
argument to show that there is no reduction in security?

> If I understood correctly, we can continue discussing what is written below. 
> If not, you can forget about below part and give more help to follow your 
> previous e-mail.
> 
> In that case maybe the solution is an announcement policy that broadcast to 
> neighbors that such file(key) is not valid anymore. Such messages will be 
> harmless and not too promiscuous, won't them?…Although, neighbors who didn't 
> knew such file was available through such node will learn it though that kind 
> of announcements. Is that a problem?
> 
> For instance, take the following scenario:
> (assumptions: for simplicity 1 identity per node, index items are simplified 
> to {key,location})
> Neighbors of Node1(n1): {n2, n3, n4, n5, n7}
> 
> Neighbors of Node4(n4): {n1, n30, n7}
> Scenario:
> 1) Request of n5 to n1: "Give me file with key=200"
> 2) n1 index contains {key=201,n4}
> 3) n1 decides to forward request to n4:  "Give me file with key=200"
> 4) n4 answers to n1 with file
> 5) n1 forwards file to n5
> 6) n1 stores a copy of the file in its cache datastore
> 7) n1 stores {key=200,n4} in his index
> 8) n1 stores {key=200,n5} in his index (n1 does not know n5 is final 
> destination).
> 9) n1 receives another request for key 200, he doesn't need to forward 
> request because the file is stored in its cache
> 10) (time passes) file is deleted from n1's LRU cache
> 11) (time passes) n1 receives another request for key 200 from n3.
> 12) n1 needs to decide whether to forward request to n4 or n5. Not sure what 
> is the criteria here, maybe he uses the node's reputation (WoT).
> 13) Some time later, file with key=200 is deleted from n4's datastore because 
> of LRU policy.
> 14) n4 broadcast to its neighbors n1, n30, n7 that key=200 is eliminated
> 
> However, the fact that key=200 can be found though n4 may imply that node n4 
> knows how to get key=200. There is another reason that makes me think this 
> assumption is valid. Nodes with similar keys are cluster together, aren't 
> them?
> Then, which routing decision is better for key=200? n5 or still using n4? *at 
> this point I am bewildered*
> Not sure if n1 should delete {key=200,n4} from his index when he receives the 
> n4 broadcast.
> 
> Some comments: I am not sure if is possible to store a key with multiple 
> locations like steps 7 and 8, I guess is possible. I am still confused about 
> location swapping and what are the consequences of location swapping in the 
> node's index
> 
> Maybe a silly question, but...What do you mean by "top-level data structure". 
> What is the top-level data structure of Freenet?
> 

"Top-level data structure" refers to the identity of the pieces of the Library 
index as a coherent whole, and is represented by the SkeletonBTreeMap 
structure. By contrast, Bigtable/freenet provides per-row/key access, and there 
is no concept of "the entire table" or "the entire DHT" from the client's 
perspective.

> Regarding your security note[1]. Not sure what do you mean. I suppose that 
> you refer to the fact that a node datastore cannot be accessed remotely. 
> Users only send requests to other nodes asking for a file and the remote node 
> verifies its datastore and answers the request. Thus, the storage is accessed 
> only locally.
> 

That note is not about the datastore, it's about the Library index which 
operates on top of it. It's stored as an SSK - do you know what that is? The 
same concept is in Tahoe-LAFS as well.

> Regarding what I am trying to achieve. I am looking somehow accelerate the 
> speed of the search, share bookmarks (specialized in some terms) among a 
> group of people probably by using PSK maybe some friends of friends. Improve 
> the Library code.
> 

OK - in that case, it is vital to understand every aspect of what I'm 
describing. Thanks for your patience. :)

Ximin

> Thanks a lot again and forgive my dummy assumptions,
> 
> leuchtkaefer
> 
> 
> 
> ________________________________
>  From: Ximin Luo <[email protected]>
> To: leuchtkaefer <[email protected]>; Discussion of development issues 
> <[email protected]> 
> Sent: Tuesday, May 21, 2013 12:06 PM
> Subject: Re: [freenet-dev] questions about Library for my GSoC project
>  
> 
> On 20/05/13 22:36, leuchtkaefer wrote:
>>
>>
>> Hi infinity0,
>>
>> My proposal to GSoC13 is highly related to your code (Library). 
>>
>> First, do you have any extra documentation on the code that you think it 
>> could be useful for me to understand the most important parts, such the 
>> SkeletonBTreeMap?
>>
> 
> Hello,
> 
> I did Library for GSoC 2009 and back then I was inexperienced with building 
> and
> engineering large software codebases (such as freenet and its plugin
> ecosystem). There are many aspects of Library that I would do differently 
> today
> if I was re-doing that project.
> 
> A large part of Library focuses on serialisation of massively-large(1) data
> structures, implemented *on top of* freenet's decentralised(2) storage. (1) 
> and
> (2) together is what makes the problem hard.[1] For my GSoC 2009 project, I
> tried to solve this problem by implementing a load-on-demand local data
> structure (SkeletonBTreeMap) that represents the *overall* data structure (a
> B-tree) as it exists on freenet storage.
> 
> By contrast, massively scalable distributed systems such as Bigtable, and even
> the underlying freenet DHT storage system, never expose the *overall* data
> structure to the clients of those systems - instead they allow piece-by-piece
> access, e.g. by row, or by key, and the client never sees the top-level data
> structure.
> 
> For Library's B-Tree, this is not feasible, because (due to the design of
> freenet) we cannot offload computation (i.e. data structure book-keeping) onto
> other nodes.[2] It was also not feasible to use freenet's decentralised 
> storage
> more directly, because it has certain properties (such as LRU cache) that are
> not acceptable for a search index.
> 
> So. That was an overview of the abstract algorithmic issues surrounding the
> design of Library. Please let me know if any part of what I just said is not
> understandable. Every sentence makes an important theoretical point. If you do
> not fully understand *any part*, ask me to clarify, otherwise I fear that you
> may repeat the same mistakes that I did. This is not exactly a problem since
> GSoC is partly about learning - but it would be sub-optimal for the project's
> progress.
> 
> I'll hold off on answering the rest of your questions to give you a chance to
> digest my previous answers. Understanding those will make it easier for you to
> understand my answers to the next section - and you may even be able to figure
> those answers out for yourself without me explaining it explicitly.
> 
> Also, if you give me some context on what you're trying to achieve, I can give
> more specific advice.
> 
> Let me know how you get along, and good luck!
> Ximin
> 
> [1] We are lucky that we don't have to further worry about security because 
> the
> underlying freenet storage allows us to restrict access to one single user.
> [2] Perhaps one day, a system that supports fully homomorphic encryption will
> allow this to happen.
> 
>> Second, I have some questions:
>>
>> 1. You disabled the "boolean internal_entries" inside the 
>> classSkeletonBTreeMap and use option 2. I don't understand what do you mean 
>> about a dummy serialiser that copies task.data to task.meta. What contains 
>> task.data?  
>>
>> 2. What means deflate/inflate the node?
>>
>> 3. What is a GhostNode? I understood is a not desirable structure used to 
>> contain some metadata or sth related with the serializer and needs to be 
>> removed.
>>
>> If you can elaborate more about Library, besides the documentation already 
>> published in the wiki, it will be of great help.
>>
>> Thanks in advance,
>>
>> leuchtkaefer
>>
>>
>>
>> _______________________________________________
>> Devl mailing list
>> [email protected]
>> https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
> 
> 
_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] questions about Library for my GSoC project

Reply via email to