[Freenet-dev] Updatable data

Oskar Sandberg Thu, 11 May 2000 00:14:32 +0200

On Wed, 10 May 2000, Ian Clarke wrote:

> > a) The "explosion" makes no use of the natural tendencies of Freenet
> > (references converge to a point where the data can be found, they don't
> > diverege to everywhere it is cached), and as such will have little or no 
> > effect.
> 
> References are a two-way thing - if node A has similar data to node B,
> it is just as likely to have forwarded requests that were answered by
> node B, as node B is to have forwarded requests that were answered by
> node A (or so my intuition tells me - and intuition has got us this
> far!).  Also, I seem to recall implementing a mechanism by which
> occasionally a DataSource is pointed to where a DataReply is sent to,
> rather than where it is sent from (this is to aid node discovery).  I
> can't see this in the code right now though, has it been removed?  It
> was there for a reason!


a) References being a two way thing is true to some degree. But the effect is
FAR from enough to make the explosion useful. A datastore contains 500
references on average. In order to route, it picks the best out of these 500,
so the search is getting 500 times more refined which each step. 

In other words, node 4 in my five node example below could be the 500th
"closest" center on Freenet. Node 3 could be 250,000th closest. Node 2 would
probably has hardly any any connection to the epi center's value at all (but it
has cached the data should it get another request), and node 1 which was simply
the local node of the Requestee has none. 

The thing is, even if you reject my logic here some reason (as I am sure you
already have) if the reverse references are equally (or even a portion of) as
strong as the forward references, then Freenet requests cannot work because
they would end up oscilating back the way they came (after all, the fact that
we have this drastic refinement in each step is exactly why we believe we can
locate data fast). If we are to have data traveling both to and away from the
"epi-center", then we HAVE to have two different reference spaces for it.

b) You have not implemented, nor to the best of my ability discussed (at
least not since I joined) setting the references backwards sometimes on
Requests. While this is a method that could help node discovery (node discovery
is broken as hell currently, but we can discuss that another day) it is not a
method I like for various reasons. 

And for it to help in this case, the frequency of setting the references in the
wrong direction would have to be so large that it screw up the efficiency of
searches completely (because again you are using one reference space for two
purposes - finding the data "epi-center" when it requested, and finding nodes
that requested it from the data "epi-center").

> > b) Using "expiry" causes one's data to die if one is unable to update it. 
> > Not
> > using expiry makes updates infeasable because of a).
> 
> This is true, but information producers will be aware of the risk when
> they set the expiry, if they want data to be around forever then they
> shouldn't do this (or they should store it separately and hyperlink to
> it from the updated page).

So then people can't have data be updatable and have it last should they be
unable to update it. That stinks.

> > But you argued that even with a request that penetrates this it would not 
> > work.
> > It does work (if normal requests and inserts do). You are right that it
> > increases the number of messages that nodes will see, but I still don't 
> > believe
> > your version will work at all.
> 
> Firstly, you have yet to convincingly justify your belief that my
> version won't work (see above), and secondly, it is an understatement to

You do not agree with my argument that they do not work. I say I have presented
a very clear and convincing justification for why they don't, but that you are
rejecting it without any convincing refutation (instead you cut it out, see
below).

> say that it will lead to an increase in the number of messages, it
> *will* lead to the slashdot effect.  Your protests against this that the
> slashdot effect won't happen because the DataRequests will be for small
> data redirecting to CHKs is rather weak - some poor sod running a
> Freenet node across his small ISDN line which happens to be the
> epi-centre of the next Starr Report is not going to survive regardless
> of how small the data which is being requested is!  In this case a
> fantastically popular piece of information will become unavailable,
> precicely the situation that Freenet it designed to avoid.  The SlashDot
> may not be as acute, but it will still happen, and that must be avoided.

Well, I agree, measures should be taken to make sure that this does not occur.
Plenty of such measures have been brought up.

The "slashdot" effect of this method using these measures would be exactly
identical to the "slashdot" effect on the epi-center every time the data
expired in your model.

<snip>
> > In fact, it will probably make sense from a time/effort perspective to start
> > with a non-follow-through meta-data request to see if that retrieves the 
> > latest
> > version.
> 
> Er, and just how will they know whether they have the latest version of
> not without doing a follow-through request?

You can see whether it is newer then the last version you had.

> I agree that this LM factor might be useful, although I would still be
> concerned about it leading to a SlashDot effect - it may lead to a
> reduction in hits on the central server, but this is not enough, it must
> lead to the hits on any given server not increasing at all in proportion
> to the total number of requests (as I think is the effect of the current
> dynamic caching), anything less will only delay the /. effect, not
> prevent it.  To suggest that it is sufficient to merely delay the /.
> effect would be very short-sighted.

The factor is designed to keep the number of requests that reach the epi-center
node relative to how often the data is updated. Nodes that pass data that is
updated often will need to pass the data a lot. That cannot be helped, but
there cannot be not unlimited growth as to how often data is updated, so given
the small size we should keep updatable data at and the lightness of
follow-throughs that majority of the time when sending the data is not
necessary, neither will the load if even for the most enthusiastic updater.

> > But, given the existance of "no-update-before" date on the data, then this 
> > is
> > exactly what I want to do, except that the initial spreading to other nodes
> > would not happen at insert, but as people requested the new data. I find 
> > your
> > resitance to this a little weird since this is the brilliance of the Freenet
> > design, the data is brought to more nodes capable of serving it not because 
> > it
> > is sent to a lot places on insert, but because it is sent to more places 
> > every
> > time it is requested. The same effect should be used for updates.
> 
> The reason I resist this idea, as I have stated ad-tedium is that I
> don't think this "slip-through" mechanism actually is in-keeping with
> the Freenet spirit, given that it opens up the system to the /. effect.

And I resist your idea, because I say that reference space is in no way
optimized for finding the places where data has been sent (in fact I say it
can't be optimized for finding both the places data has been sent and those
where a normal insert go), so "fireworks" would have little or no effect, so
you need a slip through system anyways.

I am also saying that what you are doing when you say data should be deleted
when it expires is exactly such a slip through system. Only it is one with a
lot of negative side effects.

A fireworks system could work, but you would need a second reference space that
made to locate places where the data had been sent. I'm not sure exactly how
though.

<snip>
> > This would ensure that locally or close proximity caches of the last version
> > are skipped, but it would also ensure that as soon as an update was found, 
> > that
> > would be returned, keeping the load off the "epi-center" nodes.
> 
> Er, would it?  Surely the updated data would require a new
> "return-newer-version-than" field with a higher version, and the
> follow-through messages would continue to follow-through as before.

Only to the extent that people believed there was a newer version and bothered
to look for it. It is an alternative to the "no-update-before" which disallows
looking for a new version before a certain time (exactly what delete on expires
does), but without deleting the data just because it thinks and update may be
available.

<snip>
> > Sites get Slashdotted because they serve data and often run CGI scripts and
> > alike. If every request coming from being "Slashdotted" was the equivalent 
> > of a
> > ping, sites would not fall over.
> 
> If you were shocked by my initial use of the 'B' word, I am now shocked
> with this statement!  The reason for the SlashDot effect is that the
> bandwidth required by a server must be directly proportional to the
> number of people requesting the data on that server, plain and simple. 
> The only way to address this is by breaking that relationship between
> number of requests, and concentration of hits - this is what Freenet
> does (or should do!).

While this is true to some extent, and you are right it was not a good line
argument from me (unnecessary too, since the high number requests coming
through can be kept down).

But while on the issue of the actual effect of a Node being "slashdotted",
regardless of whether it could happen, I think you are exagerating the concern.
Nodes will not "fall over" if they get to many requests at the same time, they
simply stop responding to new requests. Result: The nodes routing to that node
go on to the next closest and the epi-center moves, or more like is expanded.
Since the chance is very good that either the update-insert or one of the
initial requests went through the next closest node, this would work fine.

<snip>
> > I request some data, and my request goes through 5 nodes, all of whom cache
> > the data, and gain a reference to the node that had the data. Nodes 1-4 now
> > have the data and a reference to node 5. Node 5 does not have to have any
> > references to nodes 1-4 (chances are it will have, but nothing says it has 
> > to,
> > and if it does it is not the result of this request).
> 
> I refer you to my explanation above about how references will tend to be
> bi-directional [as I do for the rest of your comments which I have
> snipped].  You might disagree with my intuition (for which I hope you
> would provide a counter-argument), but the whole Freenet design is based
> on my intuition, so we had all better hope it is reliable!

Either we are speaking different languages or you are purposely trying to
frustrate me. Among the lines you cut out of were:

"you are relying on the somewhat vague notion thatthe nodes would inherently
be close from node 5's perspective since a request that fit 5's bias was routed
to them. But that is very vague if you consider that if datastores have 500
references, the search is getting 500 times more precise with each hop -
nothing says that node 4, and especially not 3, 2, and absolutely not 1 are
really that close at all. "

I knew what your answer would be, so I explained why I don't believe it
already in my first post. How the ____ can you cut that out and then attack me
for not giving a counter example? This is my counter example, I repeated it
above. Instead of cutting this, you could have told me exactly why this is not
so. 

> > Your method for getting past cached data is to delete it when it expires. My
> > method is to check if there is a newer version after it has expired. As you
> > said the difference is not that big - but it is important.
> 
> Yes, yours won't work - because how will you know whether there is a
> newer version without checking - which opens the risk of a /. effect!

No, yours won't work, because it makes data dangerously volatile and makes it
much easier for people to silenced, and for all updatable documents that a
person had on Freenet to disappear just because they are jailed, killed, or
simply not able to get to a computer so they can reinsert them.

An LM factor would prevent any slashdot effect.

> > Yes it does, because it means that as soon as the first request moves the 
> > new
> > data away from the "epi-center", Requests that find that data do not have 
> > to go
> > all the way to that "epi-center".
> 
> Of course they will, they will have to if they want to be sure that the
> data has not been updated *again*!

Which is where some sort of factoring in of expected time until the next
update, either by a "no-update-until", or by the (probably much smarter) LM
value, kicks in.

> > This is exactly the same as the reason why
> > nodes do not sink on a new insert! (which is heart of my idea - make updates
> > work like a normal insert to as large an extent as is possible).
> 
> No it isn't, the only way (with your mechanism) to be sure that you have
> the latest version of the data is to always ignore the caches (which
> causes a /. effect), or to ignore the caches a proportion of the time
> (which merely delays the /. effect).

Make that proportion of time equal to the proportion of time that a new update
is actually needed, and you do not have the problem.

> Ian.
> 
> _______________________________________________
> Freenet-dev mailing list
> Freenet-dev at lists.sourceforge.net
> http://lists.sourceforge.net/mailman/listinfo/freenet-dev
-- 

Oskar Sandberg

md98-osa at nada.kth.se

#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)

_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

[Freenet-dev] Updatable data

Reply via email to