[Freenet-dev] Updatable data

Ian Clarke Wed, 10 May 2000 22:34:11 +0100

> > A few things.  Firstly, that is kind of the pot calling the kettle black
> Yes, but the kettle started it...


Debatable - but I can't be arsed ;-)

> While it is true that if you kill the broadcast where ever the data is not
> found it will not grow out of proportion, that is not the only reason I 
> dislike
> anything that has broadcast in it. Basically, any form of "we will send it
> everywhere" is fundamentally not working intelligently within the system and
> taking advantage of it. It could be that it is impossible to make updates work
> taking advantage of the current system (I think it is) but then we should
> change the system so it will be (the "downwards references" would be one such
> change, though probably not a good one), not try to force fit it.

But here again you are saying that the "fireworks" proposal (thanks for
whoever suggested that name) is somehow inelegant, or inefficient, yet
as I have pointed out (and as you have not disputed) it doesn't result
in any more messages being sent, in the long term, than your (or any
other) updating mechanism.  Given that, your assertion that it is
fundamentally unintelligent is unsupported.

> a) The "explosion" makes no use of the natural tendencies of Freenet
> (references converge to a point where the data can be found, they don't
> diverege to everywhere it is cached), and as such will have little or no 
> effect.

References are a two-way thing - if node A has similar data to node B,
it is just as likely to have forwarded requests that were answered by
node B, as node B is to have forwarded requests that were answered by
node A (or so my intuition tells me - and intuition has got us this
far!).  Also, I seem to recall implementing a mechanism by which
occasionally a DataSource is pointed to where a DataReply is sent to,
rather than where it is sent from (this is to aid node discovery).  I
can't see this in the code right now though, has it been removed?  It
was there for a reason!

> b) Using "expiry" causes one's data to die if one is unable to update it. Not
> using expiry makes updates infeasable because of a).

This is true, but information producers will be aware of the risk when
they set the expiry, if they want data to be around forever then they
shouldn't do this (or they should store it separately and hyperlink to
it from the updated page).

> <snip - meta-discussions about the discussion are seldome helpful>

But they are fun.

> > But this means that the insert only reaches one "epi-centre" node, and a
> > line of nodes between you and it.  It will still be shielded by nodes
> > caching the data.  You suggest a special request which "penetrates" this
> > shield of cached data, but
> 
> But you argued that even with a request that penetrates this it would not 
> work.
> It does work (if normal requests and inserts do). You are right that it
> increases the number of messages that nodes will see, but I still don't 
> believe
> your version will work at all.

Firstly, you have yet to convincingly justify your belief that my
version won't work (see above), and secondly, it is an understatement to
say that it will lead to an increase in the number of messages, it
*will* lead to the slashdot effect.  Your protests against this that the
slashdot effect won't happen because the DataRequests will be for small
data redirecting to CHKs is rather weak - some poor sod running a
Freenet node across his small ISDN line which happens to be the
epi-centre of the next Starr Report is not going to survive regardless
of how small the data which is being requested is!  In this case a
fantastically popular piece of information will become unavailable,
precicely the situation that Freenet it designed to avoid.  The SlashDot
may not be as acute, but it will still happen, and that must be avoided.

> Users will not always use follow-through because it is much slower, and users
> are lazy. Freenet hops take a lot time, currently we estimate whopping 12
> seconds per hop, so you do not want to have the request go to maximum depth
> unless you are convinced you have to.

How many people do you know will always download the development version
of software just for those few extra features even though they know it
will be more painful?  I think most people do.  I would not like to bet
the effectiveness of Freenet on this dubious assumption about people's
behaviour.

> In fact, it will probably make sense from a time/effort perspective to start
> with a non-follow-through meta-data request to see if that retrieves the 
> latest
> version.

Er, and just how will they know whether they have the latest version of
not without doing a follow-through request?

> That said, I do agree that it leeds to more messages for certain nodes. I 
> think
> that measures like the "no-update-before" date, or possibly the algorithm from
> Squid that Theo described this morning would work to stop it. After all, if
> your reasoning is correct then Squid would not lead to any load being taken 
> off
> the web servers and ISPs since it allows one to force the proxy to check if
> the website has been updated. Obviously, this is not the case.

I agree that this LM factor might be useful, although I would still be
concerned about it leading to a SlashDot effect - it may lead to a
reduction in hits on the central server, but this is not enough, it must
lead to the hits on any given server not increasing at all in proportion
to the total number of requests (as I think is the effect of the current
dynamic caching), anything less will only delay the /. effect, not
prevent it.  To suggest that it is sufficient to merely delay the /.
effect would be very short-sighted.

> But, given the existance of "no-update-before" date on the data, then this is
> exactly what I want to do, except that the initial spreading to other nodes
> would not happen at insert, but as people requested the new data. I find your
> resitance to this a little weird since this is the brilliance of the Freenet
> design, the data is brought to more nodes capable of serving it not because it
> is sent to a lot places on insert, but because it is sent to more places every
> time it is requested. The same effect should be used for updates.

The reason I resist this idea, as I have stated ad-tedium is that I
don't think this "slip-through" mechanism actually is in-keeping with
the Freenet spirit, given that it opens up the system to the /. effect.

> Here is an alternative to the "no-update-before" idea. Instead of the
> full follow-through in a message, you have a "return-newer-revision-than"
> field. This would not return any data unless it found data of a later version
> then that given in the field. An attempt to see if data had been updated would
> simply mean to send this sort of of request with the last
> known version in the "return-newer-version-than".
> This would ensure that locally or close proximity caches of the last version
> are skipped, but it would also ensure that as soon as an update was found, 
> that
> would be returned, keeping the load off the "epi-center" nodes.

Er, would it?  Surely the updated data would require a new
"return-newer-version-than" field with a higher version, and the
follow-through messages would continue to follow-through as before.

> Yes, I do understand this. But as I noted, swamping the network is not my only
> concern about anything "broadcast". My biggest concern in this case I don't 
> see
> it working at all.

Yes, but you haven't justified this concern (particularly now that you
conceeded that message overload isn't a problem)!  Until you do, I can't
tell you why you are wrong! ;-)

> Sites get Slashdotted because they serve data and often run CGI scripts and
> alike. If every request coming from being "Slashdotted" was the equivalent of 
> a
> ping, sites would not fall over.

If you were shocked by my initial use of the 'B' word, I am now shocked
with this statement!  The reason for the SlashDot effect is that the
bandwidth required by a server must be directly proportional to the
number of people requesting the data on that server, plain and simple. 
The only way to address this is by breaking that relationship between
number of requests, and concentration of hits - this is what Freenet
does (or should do!).

> > > I still don't think this sort of "constrained explosive" routing will work
> > > downstream. Having cached the data is simply not quivalent to has link 
> > > from
> > > epi-center. Why should it be?
> >
> > Can you clarify this - I don't understand what you mean here.
> 
> Very simplified:
> 
> I request some data, and my request goes through 5 nodes, all of whom cache
> the data, and gain a reference to the node that had the data. Nodes 1-4 now
> have the data and a reference to node 5. Node 5 does not have to have any
> references to nodes 1-4 (chances are it will have, but nothing says it has to,
> and if it does it is not the result of this request).

I refer you to my explanation above about how references will tend to be
bi-directional [as I do for the rest of your comments which I have
snipped].  You might disagree with my intuition (for which I hope you
would provide a counter-argument), but the whole Freenet design is based
on my intuition, so we had all better hope it is reliable!

> Your method for getting past cached data is to delete it when it expires. My
> method is to check if there is a newer version after it has expired. As you
> said the difference is not that big - but it is important.

Yes, yours won't work - because how will you know whether there is a
newer version without checking - which opens the risk of a /. effect!

> Yes it does, because it means that as soon as the first request moves the new
> data away from the "epi-center", Requests that find that data do not have to 
> go
> all the way to that "epi-center".

Of course they will, they will have to if they want to be sure that the
data has not been updated *again*!

> This is exactly the same as the reason why
> nodes do not sink on a new insert! (which is heart of my idea - make updates
> work like a normal insert to as large an extent as is possible).

No it isn't, the only way (with your mechanism) to be sure that you have
the latest version of the data is to always ignore the caches (which
causes a /. effect), or to ignore the caches a proportion of the time
(which merely delays the /. effect).

Ian.

_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

[Freenet-dev] Updatable data

Reply via email to