Re: [freenet-chat] [FMS] Freenet development...

Matthew Toseland Fri, 24 Sep 2010 09:36:45 -0700

On Friday 24 September 2010 06:10:47 3BUIb3S50i wrote:
> o...@lkxpu0~cdv6dh0idyw4mbwkusgn~h~bs3qqvxyoxsay wrote :
> > episodi...@mtnvoeldfh6zd61fk1br94hikbkstf2xsmvqfts16lc wrote:
> >
> >> episodi...@mtnvoeldfh6zd61fk1br94hikbkstf2xsmvqfts16lc wrote :
> >>> o...@lkxpu0~cdv6dh0idyw4mbwkusgn~h~bs3qqvxyoxsay wrote :
> >>>> is dead, right?
> >>>
> >>> last commit to fred-staging: Fri Sep 17 (6 days ago, that is)
> >>> last commit from toad: Sat Sep 11 (12 days ago)
> >>>
> >>> Let's hope no health issue is involved in here.
> >>
> >> that was on master branch, toad is currently working on
> 'new-load-management'
> >> branch (multiple commits today)
> >
> > It will not work.
> >
> > Just like cool down queue.
> > Rewritten but performs as bad as before.
> >
> > Bad backed off rate, bad download rate, bad local download rate, turtles
> > everywhere, bad ping time, bad success rates by HTL, bad pReject for every
> > peer, bad payload output, stalled inserts, bad queue management, internal
> > errors for downloads, hash mismatches for downloads,...
> > did I miss any bug that has not been solved for months now?
> >
> > With almost every build situation got worse.
> > 30 broken builds since 1247, absolutely no improvement.
> >
> > Providing log entries, in vaine.
> > Inserting test files, in vaine.
> >
> > Toad doesn't use freenet, he likes to write new code but he doesn't
> > investigate and fix bugs.
> > He's on the run.
> >
> > Development is dead.
> 
Lets have 0.4 back then. Complete with DataStoreBug, non-working routing, and 
all the other fun.


Seriously, I do care about Freenet. And I've had a pretty difficult year one 
way or another.

And most of the problems you mention are not trivial bugs to be fixed by 
changing one line of code. The only line of code that could have caused the 
load management issues was changed back in 1277 and surprise surprise 
everything is still borked.

Hash mismatches for downloads and internal errors for downloads are fixed as 
far as I know. Or at least, are sufficiently rare as to be of much lower 
priority than the main network level problem - slow downloads and high backoffs.

If you are still getting high pReject's, please tell me what the CAUSE is. 
"Preemptive reject reasons" on the stats page in advanced mode. There are 3 
common reasons afaics:

Output bandwidth liability - This should be the most common reason. If so, the 
AIMD's still haven't adjusted, too many requests (or inserts?) are still being 
issued.

Thread limit - If this is the most common reason, I need to know. If it only 
happens on nodes with very high bandwidth then IMHO it is not a critical issue. 
However if it is happening on many average nodes, we *really* need to know, we 
can reduce the amount of healing further and/or expedite threadless transfers.

Ping times - This is generally caused by CPU usage or local network problems. 
If the CPU usage is caused by your node, it might be due to not having enough 
memory or similar issues. If the node is using a lot of CPU itself, something 
interesting might be happening. Various abuses of the network seem to be 
related to high ping times too. And QoS can also cause this. In any case, THIS 
IS BAD, it makes it difficult for Freenet to manage load properly, and it's 
generally not caused directly by Freenet itself.

And yes I have heard you complain over and over about your downstream bandwidth 
being lower than your upstream bandwidth. THIS IS NOT A SIMPLE BUG, any more 
than the high backoffs are a simple bug. The load management system is 
fundamentally very poor, and is neither designed to nor able to without 
significant changes guarantee a specific proportion of capacity is used for 
your own requests (or for any other peer; it is fundamentally unfair and 
therefore vulnerable to at least local DoS). Plus, it is not clear that it 
would be good for the network if that proportion was too high anyway. But it 
will be shared fairly, and will be configurable, when the new code lands.

To specifically address the issue of the cooldown queue, one of the more 
serious usability problems for Freenet has always been (at least since the db4o 
branch landed, and before that we had other problems i.e. no download resuming) 
that it uses a scary amount of disk I/O, especially if you have a lot of 
downloads queued. The cooldown queue changes were intended to radically reduce 
this, and as far as I can see without a detailed investigation they did. 
Provided there is enough memory to cache the whole node.db4o file in RAM (turn 
on defrag on startup to help with this), we should only have disk writes when 
there is actual progress being made with downloads, and the number of reads 
should be substantially reduced too. If you are looking at stats please try to 
look at the stats for the number of reads/writes that *actually reach the disk* 
(r/s and w/s in iostat -x <number> e.g.), and ignore the number of blocks as 
reading (or writing) 512 bytes or 32KB is almost identical performance wise. 
Further improvements are planned but can't be implemented until after disk 
crypto is sorted out (either fixed or removed). I have spent considerable time 
debugging the cooldown queue changes and afaics it is working well enough that 
the load problems are a higher priority.

It is not true that I don't investigate bugs. I did spend a significant amount 
of time investigating various of the current problems, frequently leading down 
wild goose chases. For example with the client layer bugs before the cooldown 
queue. But when something is fundamentally broken by design and will have to be 
rewritten before release anyway, and where that rewrite is likely to 
substantially improve things, it simply makes sense to do the rewrite. It is 
true that this can sometimes be disruptive in itself. But on the other hand if 
everything is busted anyway it may be better to solve not only the existing 
bugs that everyone is complaining about but also the long term issues (such as 
heavy disk I/O) at the same time, by doing the long-overdue rewrite - even if 
that risks creating more, different bugs.

Now, to talk specifically about the load management code: The current load 
management code is broken by design. It is true that it worked for a while, and 
it is true that it is not immediately clear why it doesn't seem to be working 
well now. However it is also true that by design it involves a rather large 
amount of misrouting (including routing to the right node and then rejecting 
because of load), which IMHO is the real reason why data persistence is so poor 
(with knock on effects for performance on just about everything). It works very 
badly on darknet, it works very badly with fast peers, it can't cope well with 
lots of new nodes, and it generally just works very badly. If it works 
reasonably (NOT WELL) in practice for a while this is the product of luck 
rather than good design.

It is also fairly vulnerable. There is an outside possibility that the current 
problems are malicious in nature; we've seen discussions on #freenet-chat 
turned into 0day exploits on Frost, and this may have happened again. However 
it is more likely IMHO that it is just broken - mainly because any funded 
attacker wouldn't want to DoS the entire network, they'd rather surveil it, 
which unfortunately is rather easy with even a relatively large opennet. But if 
we make Freenet work well enough to have a much larger number of users, darknet 
hopefully will solve that problem.

The discussions we've had over the years have brought me to the point where I 
am fairly confident I can replace it with something which works much better.

The current new-load-management branch is still very much a work in progress. 
During the process I have discovered a number of bugs which are not specific to 
load management but which are closely related to it, and these will be deployed 
first - e.g. in the block transfer and message queue code. Currently I am 
working on code related to timeouts: The new load management relies on being 
able to know how many of our requests are running on the node we are sending 
them to, so timing out and hoping that the node doesn't continue the request 
and thus multiply load (which of course it will, in many cases), as we do now, 
is a really bad idea.

In summary, if you think I am a negative influence and nobody else is 
contributing significant code, you can either go away or you can contribute 
yourself. If you think that my dominant position makes any contributions you 
make liable to be sabotaged, and that all the work I've done in the last 8 
years, starting with eliminating the DataStoreBug (I bet you don't even 
remember that one do you?) and getting 0.5 out the door, has been worthless, 
feel free to fork some ancient pre-September-2002 version of Freenet. It will 
work spectacularly as long as your network is tiny; such forks always have. Or 
rewrite it in C, O'Caml, Python, ARM assembler language, brainfart or whatever; 
see you in 8 years. And feel free to believe I am working for the NSA to 
sabotage Freenet, or for Google to do God knows what (that one is true, they've 
donated the bulk of our total funds over the last two years, but they've never 
asked for anything of substance; my guess is they are interested in 1) being a 
friend to open source, 2) cheaply evaluating some potentially interesting 
technology, and 3) hostile environments, some of whom they've had very obvious 
political issues with recently). Meanwhile I will continue to try to improve 
Freenet based on what I see (with Ian and the community) as being its biggest 
problems. IMHO, despite the problems we have been having lately, over the last 
two years in particular we have made major progress both on implementation and 
on understanding the nature of the challenges facing us, thanks particularly to 
Evan (who sadly has disappeared) but also to others. We have solved some of 
them already (e.g. the client layer changes were planned for years and if the 
network wasn't so messed up cross-segment FEC e.g. ought to have a big impact) 
and have a good chance of making major progress in the relatively near future 
IMHO.

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
chat mailing list
chat@freenetproject.org
Archived: http://news.gmane.org/gmane.network.freenet.general
Unsubscribe at http://emu.freenetproject.org/cgi-bin/mailman/listinfo/chat
Or mailto:chat-requ...@freenetproject.org?subject=unsubscribe

Re: [freenet-chat] [FMS] Freenet development...

Reply via email to