@dloyer123: Yes, please let us know your benchmark results. Since I'm also thinking about an hardware upgrade it would be helpfull to see them ...
@Tomasz: thanks for your detailed explanations! Best regards, Markus --- In [email protected], "dloyer123" <[EMAIL PROTECTED]> wrote: > > Nice, tight loop. It is good to see someone that has made the effort > to make the most out of every cycle and the result shows. > > My new E8400 (45nm 3GHz, dual core) system should arrive tomorrow. > The first thing I will do will be to benchmark it running ami. I run > portfolio backtests over a few years of 5 minute data over a thousand > or so symbols. Plenty of data to overflow the cache, but still fit > in memory. No trig. > > I'll post what I find. > > If what you say is true, and one core alone fills the memory > bandwidth, then there should be a net loss of performance while > running two copies of ami. > > > > --- In [email protected], "Tomasz Janeczko" <groups@> > wrote: > > > > Hello, > > > > FYI: SINGLE processor core running an AFL formula is able to > saturate memory bandwidth > > in majority of most common operations/functions > > if total array sizes used in given formula exceedes DATA cache size. > > > > You need to understand that AFL runs with native assembly speed > > when using array operations. > > A simple array multiplication like this > > > > X = Close * H; // array multiplication > > > > gets compiled to just 8 assembly instructions: > > > > loop: 8B 54 24 58 mov edx,dword ptr [esp+58h] > > 00465068 46 inc > esi ; increase counters > > 00465069 83 C0 04 add eax,4 > > 0046506C 3B F7 cmp esi,edi > > 0046506E D9 44 B2 FC fld dword ptr [edx+esi*4- > 4] ; get element of close array > > 00465072 D8 4C 08 FC fmul dword ptr [eax+ecx- > 4] ; multiply by element of high array > > 00465076 D9 58 FC fstp dword ptr [eax- > 4] ; store result > > 00465079 7C E9 jl > loop ; continue until all elements are processed > > > > As you can see there are three 4 byte memory accesses per loop > iteration (2 reads each 4 bytes long and 1 write 4 byte long) > > > > On my (2 year old) 2GHz Athlon x2 64 single iteration of this loop > takes 6 nanoseconds (see benchmark code below). > > So, during 6 nanoseconds we have 8 byte reads and 4 byte store. > Thats (8/(6e-9)) bytes per second = 1333 MB per second read > > and 667 MB per second write simultaneously i.e. 2GB/sec combined ! > > > > Now if you look at memory benchmarks: > > http://community.compuserve.com/n/docs/docDownload.aspx?webtag=ws- > pchardware&guid=6827f836-8c33-4063-aaf5-c93605dd1dc6 > > you will see that 2GB/s is THE LIMIT of system memory speed on > Athlon x64 (DDR2 dual channel) > > And that's considering the fact that Athlon has superior-to-intel > on-die integrated memory controller (hypertransfer) > > > > // benchmark code - for accurrate results run it on LARGE arrays - > intraday database, 1-minute interval, 50K bars or more) > > GetPerformanceCounter(1); > > for(k = 0; k < 1000; k++ ) X = C * H; > > "Time per single iteration [s]="+1e-3*GetPerformanceCounter()/ > (1000*BarCount); > > > > Only really complex operations that use *lots* of FPU (floating > point) cycles > > such as trigonometric (sin/cos/tan) functions are slow enough for > the memory > > to keep up. > > > > Of course one may say that I am using "old" processor, and new > computers have faster RAM and that's true > > but processor speeds increase FASTER than bus speeds and the gap > between processor and RAM > > becomes larger and larger so with newer CPUs the situation will be > worse, not better. > > > > > > Best regards, > > Tomasz Janeczko > > amibroker.com > > ----- Original Message ----- > > From: "dloyer123" <dloyer123@> > > To: <[email protected]> > > Sent: Tuesday, May 13, 2008 5:02 PM > > Subject: [amibroker] Re: Dual-core vs. quad-core > > > > > > > All of the cores have to share the same front bus and > northbridge. > > > The northbridge connects the cpu to memory and has limited > bandwidth. > > > > > > If several cores are running memory hungry applications, the > front > > > buss will saturate. > > > > > > The L2 cache helps for most applications, but not if you are > burning > > > through a few G of quote data. The L2 cache is just 4-8MB. > > > > > > The newer multi core systems have much faster front buses and > that > > > trend is likely to continue. > > > > > > So, it would be nice if AMI could support running multi cores, > even > > > if it was just running different optimization passes on different > > > cores. That would saturate the front bus, but take advantage of > all > > > of the memory bandwidth you have. It would really help those > multi > > > day walkforward runs. > > > > > > > > > > > > --- In [email protected], "markhoff" <markhoff@> wrote: > > >> > > >> > > >> If you have a runtime penalty when running 2 independent AB jobs > on > > > a > > >> Core Duo CPU it might be caused by too less memory (swapping to > > > disk) > > >> or other tasks which are also running (e.g. a web browser, audio > > >> streamer or whatever). You can check this with a process explorer > > >> which shows each tasks CPU utilisation. Similar, 4 AB jobs on a > Core > > >> Quad should have nearly no penalty in runtime. > > >> > > >> Tomasz stated that multi-thread optimization does not scale good > > > with > > >> the CPU number, but it is not clear to me why this is the case. > In > > > my > > >> understanding, AA optimization is a sequential process of > running > > > the > > >> same AFL script with different parameters. If I have an AFL with > > >> significantly long runtime per optimization step (e.g. 1 minute) > the > > >> overhead for the multi-threading should become quite small and > > >> independent tasks should scale nearly with the number of CPUs > (as > > > long > > >> as there is sufficient memory, n threads might need n-times more > > >> memory than a single thread). For sure the situation is > different if > > >> my single optimization run takes only a few millisecs or > seconds, > > > then > > >> the overhead for multi-thread-managment goes up ... > > >> > > >> Maybe Tomasz can give some detailed comments on that issue? > > >> > > >> Best regards, > > >> Markus > > >> > > > > > > > > > ------------------------------------ > > > > > > Please note that this group is for discussion between users only. > > > > > > To get support from AmiBroker please send an e-mail directly to > > > SUPPORT {at} amibroker.com > > > > > > For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG: > > > http://www.amibroker.com/devlog/ > > > > > > For other support material please check also: > > > http://www.amibroker.com/support.html > > > Yahoo! Groups Links > > > > > > > > > > > >
