On Wednesday 16 July 2008 17:28, Ian Clarke wrote:
> Thanks for the feedback.
> 
> You are right that Freenet currently uses way too much CPU, although
> Toad tells me that he has done quite a bit of CPU profiling to try to
> track down this problem.  I'd be interested in his latest findings in
> this regard.

I haven't done much recently. IIRC sdiz has done some recently. AFAICS, (I may 
be wrong, this was a while ago), all the low hanging fruit has been dealt 
with, leaving:
- Reduce the number of threads we use.
- Reduce the amount of RAM we use.
- Reduce the amount of I/O we do. (Not strictly CPU, of course, but relevant 
to overall performance impact).

Generally when I have seen high CPU usage it has been due to out of memory, 
and generally memory usage is caused by the datastore and the client layer.

Sdiz's new datastore should reduce memory usage, I/O, threads, and complexity, 
as well as having benefits in other areas (no store rebuilding e.g.). I 
believe that is rapidly approaching mergeability.

My db4o branch moves the client layer into a lightweight database, hopefully 
reducing RAM usage for a large queue, enabling faster startup, and 
effectively implementing paging on demand of requests from disk. The downside 
is that it may result in more disk I/O, especially if the database doesn't 
fit in RAM, but even if it does. This should be offset to some degree by 
sdiz's work. And if we don't have enough memory allocated to cache everything 
in RAM (which will be the norm), it will also result in more dynamic garbage 
collection (i.e. objects being pulled from the database, used, and then 
garbage collected). This may result in higher CPU usage, cache thrashing etc. 
However, if we don't do this, we run the risk of continual 100% CPU usage and 
outright crashing on larger queues with lower memory limits.

Apart from these, there are things we could do for further optimisation. There 
are a few areas where we still have one thread whose sole purpose is to wait 
for another thread to finish a task and then call a callback. Obviously we 
should get rid of these:
- RequestStarter$SenderThread is the main offender. We call the node, it 
creates a separate thread to run the request, which eventually notifies us... 
There may also be some places where we do similar things with data transfer. 
Another notable one is "Finish CHK transfer".
- Bulk data transfer is very simple and could be implemented threadlessly 
without a major impact on code clarity.
- Block transfer is unnecessarily complex and should be converted so that the 
above applies.
- A small number of places use a thread where they don't need to, e.g. 
BackgroundBlockEncoder, IPAddressDetector.

> 
> What you are describing is basically a "callback" approach to
> concurrency.  Earlier incarnations of Freenet employed this approach,
> but we found that it led to spaghetti code because the logical flow
> "get request, send request, wait for response, send response etc" was
> spread out among different methods in different parts of the code,
> making it difficult to follow and reason about.

We do in fact use callbacks extensively, but not for low level requests.

I am not entirely absolutely totally convinced that state machines have to be 
ugly/evil. But that is the general assumption at the moment.

Another interesting point is that FOAF routing should cut path lengths in half 
on average, meaning requests last for less time ... but does that mean we'll 
have twice as many running?
> 
> We thus decided to opt for a message-passing approach, inherited from
> Dijjer.  This is described here:
> 
>   http://code.google.com/p/dijjer/wiki/MessagingLayer
> 
> The nice thing about this approach is that the code maps closely to
> the logical "flow", making it much easier to write, read, and debug.
> 
> Ideally this would be implemented using very light-weight threads,
> perhaps through use of continuations, however as an interim measure we
> implemented it using full Java threads because there is as-yet no
> built-in support for continuations in Java.
> 
> We should investigate whether this use of full Java threads is a
> source of CPU overload, although whenever I've discussed Toad's
> profiling efforts, I don't recall that this was the case.
> 
> If it turns out that threads are the problem, there are a few Java
> implementations of continuations we could explore, see
> http://lambda-the-ultimate.org/node/1002 - although they all rely on
> voodoo like bytecode rewriting and this makes me worry that we could
> wind up with some very tricky bugs.

I'd prefer not to use continuations unless we really have to yeah...

One final point: Are gamers ever happy with any background app? If they are 
hardcore gamers then they will run some wierd stripped down version of 
Windows XP they downloaded from the internet to get an extra 1fps...

One thing we could do for gamers is provide a system tray icon which could be 
used to easily turn Freenet off when gaming ... of course if 90% of their 
system uptime is gaming, this may not help us much!
> 
> Ian.
> 
> On Wed, Jul 16, 2008 at 9:08 AM, Cory Nelson <phrosty at gmail.com> wrote:
> > Hey guys,
> >
> > I know my criticism of Freenet probably makes me a bit unpopular with
> > you, but I hope you at least know I stick around because I like what
> > Freenet stands for and am just frustrated from wanting it to get
> > better.  If I had the time I would spend some learning Java to send
> > patches, but I do not so I'll propose some of what I'd like to see
> > here.
> >
> > The biggest reason I have no friends who use Freenet, and am therefor
> > stuck on opennet, is because it takes up too much CPU.  For people who
> > run demanding apps like games, Freenet simply isn't low-profile enough
> > to keep running without a really powerful box.  And really, Freenet
> > doesn't actually do a whole lot so it shouldn't ever *need* a powerful
> > box.  Having built highly scalable, performant daemons in C/C++/C#, I
> > hope I can give some general advice without knowing Java.
> >
> > Some key ideas for highly scalable software:
> >
> > a) Use as few threads as possible.  This is important for two reasons:
> >
> >    aa) If a thread locks and goes into a context switch before
> > unlocking, all the other threads waiting for the lock are stuck doing
> > nothing until the other thread wakes up to unlock.  The more threads
> > there are, the more chances there are of this happening.  Not doing
> > work sucks, but this sucks even more:  most synchronization stuff will
> > spin for a short time in user-mode in hopes of the locked thread
> > completing quickly and avoiding a context switch.  This usually
> > greatly improves performance.  Making these threads do long waits will
> > make the sync primitives not be able to use this, allocating kernel
> > objects to wait on which are significantly slower and increases memory
> > usage.  This is a *huge* scalability killer.
> >
> >    ab) Lets say that a context switch involves bringing in at least
> > 256 bytes worth of cache lines: thread state, CPU state, stack space,
> > etc.  With Freenet having 250 threads open -- which for me is about
> > average with an empty download queue and FMS running -- this means
> > 64KB is being constantly shuffled into cache.  This is detrimental to
> > performance of the entire system, especially for low-end systems with
> > 512KB or less cache.
> >
> > b) Data that is used frequently by multiple threads should be kept out
> > of the same cache line (modern cache line size is 64 bytes).  Sharing
> > data can make the CPU's cache coherency logic trigger cross-talk.
> > I've seen this alone hurt app performance by as much as a 20% slowdown
> > on a Core2 Quad.
> >
> > c) Work with buffers aligned for the operating system's memory
> > manager.  Windows locks a full page in memory when you use one for
> > I/O, so try to keep I/O buffers aligned to 4KB addresses and size.  I
> > realize Java doesn't give you any way to control the address, so
> > hopefully it will do the right thing here.
> >
> > The best architecture for I have found for scalability involves a
> > "completion queue".  This is basically just a queue of callbacks that
> > a thread constantly pops from.  I/O is all done asynchronously, and
> > pushes a callback onto the queue when the I/O is complete.  The
> > callback can then initiate more I/O, hash stuff, or do whatever it
> > needs to.  Locks must always be released before a callback finishes,
> > or deadlocks would occur.  This gives clean code that is basically a
> > chain of methods:
> >
> > void main() {
> >   queue q;
> >
> >   begin_request();
> >   for(;;) {
> >      callback = q.pop();
> >      callback();
> >   }
> > }
> >
> > void begin_request() {
> >   begin_send(buffer, len, on_send); // on_send will be called once
> > the send has finished.
> > }
> >
> > void on_send(int error, int transfered_bytes) {
> >   if(error) ...
> >   else {
> >      buffer += transfered_bytes;
> >      len -= transfered_bytes;
> >
> >      if(len) begin_send(buffer, len, on_send);
> >      else ...
> >   }
> > }
> >
> > The design is quite different from blocking code, but I would argue it
> > is just as simple, and can even result in much cleaner code due to
> > splitting an otherwise huge method into smaller operation-centric
> > methods which are more easily understood.
> >
> > This would open up two design choices:
> > a) a single thread per logical processor, each running the loop in
> > main() above, which can scale fantastically.
> > b) a single thread, period.  all locks could be removed from code,
> > greatly simplifying it and removing any chance of deadlock, priority
> > inversion, etc.
> >
> > I have no idea if Java is capable of this.  Toad seemed to indicate it
> > isn't, but I thought I'd share my ideas anyway.  I've heard Apache's
> > MINA mentioned a few times, maybe that can help.
> >
> > --
> > Cory Nelson
> > _______________________________________________
> > Devl mailing list
> > Devl at freenetproject.org
> > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
> >
> 
> 
> 
> -- 
> Email: ian at uprizer.com
> Cell: +1 512 422 3588
> Skype: sanity
> _______________________________________________
> Devl mailing list
> Devl at freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20080716/a5abdf79/attachment.pgp>

Reply via email to