[Freenet-dev] Updatable data

Ian Clarke Wed, 10 May 2000 13:19:30 +0100

> > > I had to go back and look up this thread because it was mistitled,
> > > but I don't buy your argument at all.  Contrary to your assertion,
> > > Oskar did in fact address your points directly, and convincingly.
> >
> > Yes, I have just read his answer, but it is your opinion that he
> > addressed my points convincingly, not mine!
> But in your reply to my post you cut off everything except two lines. You take
> up some more below, but you still haven't addressed my main defense for why it
> works, namely that the propogation of the update is exactly like the 
> propogation
> of the newly inserted data.


A few things.  Firstly, that is kind of the pot calling the kettle black
- since your criticism of my proposal was sketchy at best (basically
"this is broadcast, broadcast is bad" without significant further
explanation. Now even LDC - not someone prone to agreeing with me -
conceeds that a very restricted broadcast might be a good idea).  The
difference between the popogation of this update, and the propogation of
an insert is that when an insert is propogated, you can be sure that
Freenet won't be full of nodes which are caching the data which you are
inserting.  Our ideas are more alike than you suggest, I agree that we
need a mechanism to bypass locally cached data, and some form of expiry
would allow this, but your proposal addresses this issue, which is why I
tried to incorporate it into my design.

> > As I will outline towards the end of this email, I think Oskar does make
> > some good points, but I think his proposal avoids the tough issues which
> > my proposal tackled head-on making it vulnerable to attack (remind
> > anyone of the KHK/CHK debate? ;)  Oskar's suggestion about sending the
> > update to 10-15 nodes is dubious at best, where do we get these nodes
> > from?  The datastore?  But the DataStore is likely to be filled with
> > nodes that are already in our part of information space, so they will
> > probably just take the same route to the "epi-center" of the data
> > anyway.
> 
> a) I have never avoided any of your arguments. Nobody is perfect, but I 
> believe
> myself to be above simply avoiding objections because I cannot answer them.

I did not suggest you were avoiding arguments (or didn't intend to), I
merely suggested that there was a trend for me to propose something that
addressed the tough issues (like how do people *find* data in Freenet -
KHKs being a very primitive solution), where as others propose things
which don't address the tough issues, and then attack my proposals
(CHKs, which, in themselves, offer no good answer to how people find
data - without being told what the CHK is).  I am not blaming you for
this, it is just something I have observed a few times.

> What I want is for Freenet to work as well as possible, if my ideas are flawed
> (and like you noted, I have of course had flawed ideas, though I try to filter
> most myself before posting) then out they go.

Me too - I would like to think I am quite good about going with someone
else's idea when it seems I am in a minority, *even* when I remain
unconvinced (such as where I thought meta-data should be part of the
data-stream and only separated by the client, I seem to recall that
people recently came around to my way of thinking on this, but I
resisted the "I told you so").  On this issue, though, I think it is too
important not to be debated fully.

> If anything, what reminds me of the CHK debating is that I answer and answer
> and answer to the best of my ability, and you still accuse me of avoiding to
> answer. I truly don't know what to do to satisfy you in this regard.

The question to which you were avoiding the answer is "how do you find
data under a CHK key, without having to be told the key through another
mechanism that was decentralized, and didn't suffer from all the
problems of KHKs?".  Anyway, that is water under the bridge, we have
agreed to implement searching, let's forget it.

> b) I think you misunderstand me. The Update gets send to 10-15 nodes because 
> it
> is send just like a normal InsertRequest with HTL (I guess) of around 10-15. 
> If
> no follow-through request for the data can reach it when it has gone this far,
> then how do requests find newly inserted data, which also traveled 10-15 nodes
> using standard Request routing?

But this means that the insert only reaches one "epi-centre" node, and a
line of nodes between you and it.  It will still be shielded by nodes
caching the data.  You suggest a special request which "penetrates" this
shield of cached data, but 

> If the answer is yes, then the follow-through requests will find the updated
> data, because they route with respect to the key exactly like requests for 
> this
> new data in my example would with respect the very close key.

But the problem is that in this case the node, or small number of nodes,
which actually have the updated data will recieve all requests sent with
this "follow-through" flag set.  Now from the users perspective they are
*always* going to want the latest version of the data, thus they are
always going to set the follow-through flag, and thus if the data is in
any-way popular (such as a Freenet version of /.) these central servers
will rapidly fall-over.  Freenet will, in effect, no longer live up to
its promise of being slashdot-effect-proof.

If on the other hand, we try to make more nodes respond to the
DataRequest for the updated data, using the "explosion" mechanism I
propose, then a much larger number of nodes will be capable of
responding to these DataRequests (hopefully a number proportional to the
number of nodes actually caching the data), we avoid the /. effect.

The argument that this "explosion" of messages will swamp the network is
also incorrect - think about it.  What is the ideal result of an update
(whatever the mechanism)?  It is that all of the nodes currently caching
the data to be updated, will (after the update) be caching the updated
data.  This means that at some point, sooner-or-later, they must recieve
the update, whether through my explosion mechanism, or through your
mechanism where the updates will be carried in DataReplies.  Either way,
there is a lower-bound on the number of messages which must be sent for
a complete data-update, and this lower-bound is directly proportional to
the number of nodes caching the data.  If an "explosion" is done
correctly, it should result in a number of messages being sent that is
roughly proportional to the number of nodes caching the data.

> But it does answer the Request, it just performs a very light "make sure there
> is no newer data within reach" operation on certain requests.

But this "make sure..." process will result in a /. effect on popular
data as I point out above.

> I still don't think this sort of "constrained explosive" routing will work
> downstream. Having cached the data is simply not quivalent to has link from
> epi-center. Why should it be?

Can you clarify this - I don't understand what you mean here.

> I think you get stuck at an unholy compormise between not working and causing
> to many messages.

If the explosion won't work, then your proposal definitely won't work
since it results in much fewer nodes receiving the update initially.  As
for too many messages, see my argument above.

> The update-by field was meant to be meta-data in the content so as to give
> clients an idea of when they need to do follow-throughs.
> 
> I think that having such a field in the data on the network that actually
> kills off the data is horrible. Not only does it really hurt usability that 
> you
> have to know exactly when you will update, but it has major sustainability
> issues. Say "Disident X" runs Freenet page about how bad Regime Y is.
> Because this is updated weekly, he needs to set it to die every week in nodes
> that cache it.
> But then Disident X gets caught an shot by Regime Y (it wasn't Freenet's 
> fault,
> his woman ratted him out!) While his loss is a sad thing, at least we want to
> make sure that it does not mean that his famous page with info about Regime Y
> disappears too!

Well, disident knew what he was doing when he set the expires - he
doesn't have to do this, it is just useful because it will result in a
faster update.  He could always have two versions of the data, one with
an expires, which updates quickly, and the other without, which updates
less efficiently.

> I will take the appropriate step and make the same modification to my own
> proposal, but one that does not involve the pitfalls of actually killing off
> data on the network.
> 
> We use a deep/follow-through request system. However, data contains a storable
> field that says how long the period it during which we are sure we will NOT 
> get
> another update. During this period, even follow-through requests will 
> terminate
> on finding the data.
> 
> It still isn't perfect since it means data updates will propogate badly if at
> all during this period, but not being able to update for a while is a lot 
> better
> then the data dying if you are unable to update.

This is fine from the client end of things, but it doesn't address my
concerns about the /. effect, the explosion propogation mechanism does. 
I would be happy with this combined with some form of explosion for
propogation which required a number of messages proportional to the
number of nodes caching the data.

Ian.

_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

[Freenet-dev] Updatable data

Reply via email to