Re: hyper threading.
On Mar 26, 2005, at 2:39 PM, [EMAIL PROTECTED] wrote: This is the kind of disinformation I have been referring to What in particular are you referring to? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
On Mar 26, 2005, at 5:33 PM, [EMAIL PROTECTED] wrote: Yes, the theory is very nice; you've done a nice job reading Intel's marketing garb. What theory? All I see is On Mar 26, 2005, at 5:33 PM, [EMAIL PROTECTED] wrote: ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
If you think that then you are either a fool or an old fool.. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Tue, 29 Mar 2005 06:43:59 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: And the circumstances that you have described have nothing to do with modern computing, so as I said, its irrelevant. The circumstances have not changed in modern computing. That's one reason why 30-year-old operating systems like UNIX remain popular. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: If you think that then you are either a fool or an old fool.. I've never encountered a situation in which experience was a disadvantage. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Thats because you seem unable to grasp modern concepts. If you think that performance criteria of modern controllers and processors are the same as 30 years ago, then you are incapable of commenting on anything modern. Every controller/processor is different and has its own advantages and inefficiencies. The fact that you can make ignorant statements like I proved polling is faster because 20 years ago I wrote a driver, then you think that you know things that you don't. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Tue, 29 Mar 2005 21:02:40 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: If you think that then you are either a fool or an old fool.. I've never encountered a situation in which experience was a disadvantage. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Stop feeding this troll, he has been banned from de DragonFly BSD list for his stupid comments, his e-mail address doesn't even exist. His only goal is make the longest thread of messages in history. Stop him! On Tue, 29 Mar 2005 14:54:30 -0500, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Thats because you seem unable to grasp modern concepts. If you think that performance criteria of modern controllers and processors are the same as 30 years ago, then you are incapable of commenting on anything modern. Every controller/processor is different and has its own advantages and inefficiencies. The fact that you can make ignorant statements like I proved polling is faster because 20 years ago I wrote a driver, then you think that you know things that you don't. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Tue, 29 Mar 2005 21:02:40 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: If you think that then you are either a fool or an old fool.. I've never encountered a situation in which experience was a disadvantage. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] -- --- Guillermo García Rojas Covarrubias Director General SoloBSD http://www.solobsd.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
You are wrong about just about everything, I unsubscribed because dragonfybsd is more than a year away from being usable in a commercial environment and memory fails when you shock it with a heavy load. And I'm pretty sure my email exists. My goal is to seek intelligent life. Its a long journey. -Original Message- From: Guillermo Garcia-Rojas [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Tue, 29 Mar 2005 14:03:15 -0600 Subject: Re: hyper threading. Stop feeding this troll, he has been banned from de DragonFly BSD list for his stupid comments, his e-mail address doesn't even exist. His only goal is make the longest thread of messages in history. Stop him! On Tue, 29 Mar 2005 14:54:30 -0500, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Thats because you seem unable to grasp modern concepts. If you think that performance criteria of modern controllers and processors are the same as 30 years ago, then you are incapable of commenting on anything modern. Every controller/processor is different and has its own advantages and inefficiencies. The fact that you can make ignorant statements like I proved polling is faster because 20 years ago I wrote a driver, then you think that you know things that you don't. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Tue, 29 Mar 2005 21:02:40 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: If you think that then you are either a fool or an old fool.. I've never encountered a situation in which experience was a disadvantage. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] -- --- Guillermo García Rojas Covarrubias Director General SoloBSD http://www.solobsd.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: Thats because you seem unable to grasp modern concepts. None were under discussion. If you think that performance criteria of modern controllers and processors are the same as 30 years ago, then you are incapable of commenting on anything modern. The principles of modern controllers are surprisingly similar to those of old controllers. The biggest change is that the PC world is only now discovering what mainframe designers knew 40 years ago. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
No, I think the biggest changes are that 1) Processor speed is rarely the key limiting factor and 2) Memory efficiency is much less a concern. In the old days if you weren't a very good programmer you did something else. Today anyone can crank out code that works (linux anyone?). And processors are so fast that most people don't notice, as is evidenced by this thread. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Tue, 29 Mar 2005 22:20:31 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: Thats because you seem unable to grasp modern concepts. None were under discussion. If you think that performance criteria of modern controllers and processors are the same as 30 years ago, then you are incapable of commenting on anything modern. The principles of modern controllers are surprisingly similar to those of old controllers. The biggest change is that the PC world is only now discovering what mainframe designers knew 40 years ago. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] wrote: The principles of modern controllers are surprisingly similar to those of old controllers. The biggest change is that the PC world is only now discovering what mainframe designers knew 40 years ago. PC Designers knew it 20 years ago. When I designed the Specialix SI serial boards (for 286/386 Xenix boxes) they had interrupt throttling built in (circa 1986/7). John ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
On Tue, 2005-03-29 at 22:20 +0200, Anthony Atkielski wrote: [EMAIL PROTECTED] writes: Thats because you seem unable to grasp modern concepts. None were under discussion. As far as you can see, which shows the limit of your percption. If you think that performance criteria of modern controllers and processors are the same as 30 years ago, then you are incapable of commenting on anything modern. The principles of modern controllers are surprisingly similar to those of old controllers. The biggest change is that the PC world is only now discovering what mainframe designers knew 40 years ago. Bollocks. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
--- Anthony Atkielski [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] writes: Polling is simply unecessary in most cases. You could get better performance using an em driver and setting max ints to whatever is optimal for your system. Polling adds latency and over head for no good reason. Polling often provides better performance, at the expense of higher overhead. If you understood what I said, then you wouldn't say what you said, because its just plain wrong. __ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Boris Spirialitious writes: If you understood what I said, then you wouldn't say what you said, because its just plain wrong. I've written code that proves it right. Someone once told me that a 80286 couldn't handle ordinary terminal communications at speeds of 38400 bps. I proved that it could, but the comm program I wrote to do so used polling rather than interrupts to accomplish it. It was impossible to handle such high speeds with interrupt-driven I/O. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
I guess that depends on how you define performance. The MAX_INTS setting in the em driver essentially does what polling does (in reducing interrupts) without the overhead. So there is really no way that polling could be better. With polling you have a lot of unnecessary overhead. Setting MAX_INTS properly has zero overhead for the O/S -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Mon, 28 Mar 2005 06:03:00 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: Polling is simply unecessary in most cases. You could get better performance using an em driver and setting max ints to whatever is optimal for your system. Polling adds latency and over head for no good reason. Polling often provides better performance, at the expense of higher overhead. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Things have changed a bit since then, so I doubt that proof has any relevance. All polling does , in the context of device polling, is make networking low-priority. You are adding latency to save CPU cycles. You could argue that higher latency is lower performance. Interrupt hold offs are a much better way to reduce interrupts without poisoning your system with extra overhead. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Mon, 28 Mar 2005 16:49:20 +0200 Subject: Re: hyper threading. Boris Spirialitious writes: If you understood what I said, then you wouldn't say what you said, because its just plain wrong. I've written code that proves it right. Someone once told me that a 80286 couldn't handle ordinary terminal communications at speeds of 38400 bps. I proved that it could, but the comm program I wrote to do so used polling rather than interrupts to accomplish it. It was impossible to handle such high speeds with interrupt-driven I/O. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: Things have changed a bit since then, so I doubt that proof has any relevance. The principles haven't changed at all. Servicing interrupts is an extremely high-overhead activity. There's a minimum amount of time it takes, no matter how short the interrupt routine. There comes a point when just the inherent cost of the context switch is responsible for most of the overall cost of the interrupt service, and with a large number of interrupts, the processor(s) can spend a great deal of time just switching contexts. Polling eliminates this overhead by simply checking for I/O to service when it is convenient for the OS. As long as polls occur frequently enough not to miss any pending I/O, it's faster than interrupt-driven I/O. The total number of instructions executed is often greater, because the OS tends to spin on its polling tasks, but the absolute time required to respond to a given I/O event can be much shorter. In my case, I divided all the work of the comm program into small bits that could be done in tiny chunks. Each time a chunk was completed, I polled the serial port. Since chunks never exceeded a certain size, I always managed to poll the port in less time than it took to receive a character, even at 38,400 bps. The system was busier than it would be with interrupts driving it, but it responded more quickly to incoming traffic, and there were no transfer timeouts, whereas with interrupts, the system was less busy, but it timed out very consistently at high communications rates. By using more processor but evening out the use of processor so that it was more consistently distributed, very high communication rates could be handled by the program. All of this remains permanently applicable today, and it is why some high-speed applications poll instead of waiting for interrupts. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Do you know how MAX_INTS and Device Polling work? I can tell that you don't so why are you blabbering about how you kludged an ancient operating system to work-around poorly designed hardware? First of all, with original 8250 PC serial ports, polling wouldn't have worked because there was no buffering. So there were no chunks to deal with. Which is why someone probably told you it was impossible. If your MB had a later design, such as a 16550, then you could poll and gain some efficiency. HOWEVER, modern controllers have much buffering, and the ability to moderate interrrupts. With polling you have a minimum constant overhead, even with no traffic. Using interrupt moderation, you get the best of both worlds, because the contollers will only interrupt at a pre-set safe interval, and there is no additional overhead. And when there is no traffic there are no interrupts. So if you have good hardware, polling has negative effects on performance. It ads overhead for no additional benefit. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Mon, 28 Mar 2005 20:14:52 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: Things have changed a bit since then, so I doubt that proof has any relevance. The principles haven't changed at all. Servicing interrupts is an extremely high-overhead activity. There's a minimum amount of time it takes, no matter how short the interrupt routine. There comes a point when just the inherent cost of the context switch is responsible for most of the overall cost of the interrupt service, and with a large number of interrupts, the processor(s) can spend a great deal of time just switching contexts. Polling eliminates this overhead by simply checking for I/O to service when it is convenient for the OS. As long as polls occur frequently enough not to miss any pending I/O, it's faster than interrupt-driven I/O. The total number of instructions executed is often greater, because the OS tends to spin on its polling tasks, but the absolute time required to respond to a given I/O event can be much shorter. In my case, I divided all the work of the comm program into small bits that could be done in tiny chunks. Each time a chunk was completed, I polled the serial port. Since chunks never exceeded a certain size, I always managed to poll the port in less time than it took to receive a character, even at 38,400 bps. The system was busier than it would be with interrupts driving it, but it responded more quickly to incoming traffic, and there were no transfer timeouts, whereas with interrupts, the system was less busy, but it timed out very consistently at high communications rates. By using more processor but evening out the use of processor so that it was more consistently distributed, very high communication rates could be handled by the program. All of this remains permanently applicable today, and it is why some high-speed applications poll instead of waiting for interrupts. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: Do you know how MAX_INTS and Device Polling work? I know how device polling works. MAX_INTS is the sort of identifier that probably occurs in seven trillion lines of code in the world, so I have no idea what it means. I can tell that you don't so why are you blabbering about how you kludged an ancient operating system to work-around poorly designed hardware? I didn't say anything about an operating system. First of all, with original 8250 PC serial ports, polling wouldn't have worked because there was no buffering. No buffering was necessary. Even the oldest devices held the most recent character latched in the register, and that's what I picked up. It wasn't necessary to buffer the characters, as I picked them up as soon as they came in ... even at 38,400 bps. So there were no chunks to deal with. The chunks I had in mind had nothing to do with the incoming serial data. They were outstanding tasks divided into small blobs that could be handled between two polls of the serial port. Most of them involved things like writing data to the display, scrolling or clearing the display, and emptying and processing the keyboard buffer, not to mention transmitting outgoing data as required. Which is why someone probably told you it was impossible. They thought it was impossible because they had never thought of just polling the port. With interrupt-driven I/O, it _was_ impossible. But I just decided to stop using interrupts to eliminate that problem. If your MB had a later design, such as a 16550, then you could poll and gain some efficiency. I allowed for buffered input, as I recall, but the PCs I used it on didn't have that, and it would work without it. HOWEVER, modern controllers have much buffering, and the ability to moderate interrrupts. With polling you have a minimum constant overhead, even with no traffic. That's right, but it's a low overhead, compared to the overhead of interrupt service. Using interrupt moderation, you get the best of both worlds, because the contollers will only interrupt at a pre-set safe interval, and there is no additional overhead. And when there is no traffic there are no interrupts. I'm sure that's appropriate in some cases. In my case, it wasn't necessary. So if you have good hardware, polling has negative effects on performance. It ads overhead for no additional benefit. Polling improves performance in the circumstances I've described. The extra overhead is irrelevant as long as the system is less than 100% busy. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
And the circumstances that you have described have nothing to do with modern computing, so as I said, its irrelevant. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Tue, 29 Mar 2005 00:03:07 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: Do you know how MAX_INTS and Device Polling work? I know how device polling works. MAX_INTS is the sort of identifier that probably occurs in seven trillion lines of code in the world, so I have no idea what it means. I can tell that you don't so why are you blabbering about how you kludged an ancient operating system to work-around poorly designed hardware? I didn't say anything about an operating system. First of all, with original 8250 PC serial ports, polling wouldn't have worked because there was no buffering. No buffering was necessary. Even the oldest devices held the most recent character latched in the register, and that's what I picked up. It wasn't necessary to buffer the characters, as I picked them up as soon as they came in ... even at 38,400 bps. So there were no chunks to deal with. The chunks I had in mind had nothing to do with the incoming serial data. They were outstanding tasks divided into small blobs that could be handled between two polls of the serial port. Most of them involved things like writing data to the display, scrolling or clearing the display, and emptying and processing the keyboard buffer, not to mention transmitting outgoing data as required. Which is why someone probably told you it was impossible. They thought it was impossible because they had never thought of just polling the port. With interrupt-driven I/O, it _was_ impossible. But I just decided to stop using interrupts to eliminate that problem. If your MB had a later design, such as a 16550, then you could poll and gain some efficiency. I allowed for buffered input, as I recall, but the PCs I used it on didn't have that, and it would work without it. HOWEVER, modern controllers have much buffering, and the ability to moderate interrrupts. With polling you have a minimum constant overhead, even with no traffic. That's right, but it's a low overhead, compared to the overhead of interrupt service. Using interrupt moderation, you get the best of both worlds, because the contollers will only interrupt at a pre-set safe interval, and there is no additional overhead. And when there is no traffic there are no interrupts. I'm sure that's appropriate in some cases. In my case, it wasn't necessary. So if you have good hardware, polling has negative effects on performance. It ads overhead for no additional benefit. Polling improves performance in the circumstances I've described. The extra overhead is irrelevant as long as the system is less than 100% busy. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: And the circumstances that you have described have nothing to do with modern computing, so as I said, its irrelevant. The circumstances have not changed in modern computing. That's one reason why 30-year-old operating systems like UNIX remain popular. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Paul A. Hoadley writes: Here are some measurements. A few weeks ago I ran Unixbench 4.1.0 (/usr/ports/benchmarks/unixbench) on a P4 2.8GHz with and without hyperthreading enabled. I note a slight difference in the 10 minute load average in favour of the uniprocessor run (0.00 vs 0.10 in the hyperthreading run), though I doubt this alone could account for a 15% difference in total score. It's not clear to what extent these measurements represent simultaneous processing. The presumed advantage of hyperthreading resides in the ability to make better use of the processor hardware when you have more than one execution thread running AND the threads are doing entirely different things. Intel has demonstrated this by running completely different tasks at the same time on HT and non-HT systems; the HT systems consistently perform better. Both desktop and server systems can benefit from this. However, if you run measurements that consist of a single execution thread, or several execution threads performing the same type of work, HT will probably be slower than a UP environment. In this case, HT contributes nothing because the various threads are competing for the same processor hardware at the same time, so the global instruction rate does not improve with HT--and since SMP has higher OS overhead than a uniprocessor environment, the net result is a loss of performance. In order to profit from HT, then, you must have a mix of different tasks running on the system at the same time. This should be the normal case for most desktop and server systems, but it is never seen in benchmarks unless they are specifically designed to simulate this. Thus, while HT may help in real-world applications of servers and desktops, the only way to see this in measurements is to make sure they duplicate the type of instruction mix seen on these systems in real life. The actual architecture of hyperthreading is pretty straightforward, and it's pretty clear that it cannot result in degraded performance: either it improves performance, or it makes no difference. So the only question is whether or not HT improves performance enough in a real-world environment to offset the greater OS overhead of managing multiple processors. I think that with a heterogenous instruction mix of the type likely to be seen in real-world systems, it does (admittedly not by much). In some systems that are doing a lot of homogenous number-crunching, performance might go down, but it's difficult to imagine such a scenario for a server. Some desktops might be in that situation, if they are dedicated to single tasks (games, Mathematica, CAD, etc.). -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: You can argue the technical theory all you want, but the measurements say otherwise. You have to ensure that you're doing the right measurements. FreeBSD 4.9 - Load: 38% (I put this in for fun :-) Freebsd 5.4-Pre UP (no HT) - Load: high 55-60% range FreeBSD 5.4-Pre SMP/HT - Load: 70-80% (much more jumping around) You'll find that the total CPU time required from start to finish for a single thread is ALWAYS higher for SMP than for a UP environment, even if you have separate physical processors. Several things happen when you move from a uniprocessor environment to an environment with two or more processors: - The total CPU time for each thread increases. - The total system load on a per process basis increases. - The total throughput of the system improves if there is more than one independent process running in the system. - Each of the processors runs more slowly than it would if it were the only processor running in a UP environment. If you run a single-thread benchmark on a MP system, you'll find that it runs more slowly than it does on a UP system. If you run multiple single-thread independent benchmarks on a MP system, you'll find that total CPU time for each benchmark increases over that required in a UP system--but the elapsed time required to complete all benchmarks substantially diminishes. To properly gauge the performance of a multiprocessor system, you must run a realistic mix of tasks on the system and measure overall throughput. If you do this, you'll find that you always come out ahead with multiple processors, even HT processors. Hyperthreading is just a special case of multiprocessing that imposes some additional restrictions. HT is much more sensitive to similarities in instruction mix across processes, because the actual processor hardware is being shared. With a sufficiently heterogenous instruction mix across multiple execution threads, this isn't a problem; but if you are running a single-threaded benchmark, or a series of identical single-threaded benchmarks, it can seriously distort your measurements. Although adding physical processors diminishes the performance of each processor, it still adds overall processing power, up to a certain point. The increment is never equal to the actual number of processors added, though; that is, if you go from one to two processors, you never get a doubling of effective processor power--it's more like 70-80%. The percentage increment gets worse with each additional processor, until you reach a point at which performance actually starts to decline (the point at which this happens is extremely hardware dependent, but it's always well beyond two processors). Hyperthreaded processors should not diminish in performance just because HT is turned on, because the hardware contention that diminishes performance in conventional MP systems is largely absent in a HT microprocessor. However, since you are really still only sharing a single processor with HT, the overall increment is much lower than it would be with two physical processors, and it is very sensitive to the instruction mix. this shows that you really are a bit foggy. Did you miss the part where with 2 processors you actually do have 2 processors? I actually read what Intel had to say on how the architecture works, and I spent years measuring systems the hard way (with hardware monitors and probes), so I know somewhat whereof I speak. Multiprocessing was always a significant hot-button issue with customers, as they always wanted to know how much they really gained with multiple processors (as opposed to what they had been promised). I can make an argument that networking with 1 processor on 5.4 is better than with 2. For example, with a test similar to the above, with 2 phyiscal processors FreeBSD 5.4 will start dropping packets way before it hits 500Kpps unless you increase the interrrupts/second, which of course increases the system load. And even with the dropped packets (which should reduce the load because it doesnt have to receive and transmit the packet), the load is still higher than for 4.x with a single processor. Load is not a problem, as long as it's below 100%. Since individual processors slow down in MP configurations, anything that depends on raw processor speed will suffer in an MP configuration. However, overall system throughput is greatly enhanced by running with several processors. At the same time, the total processor time required to complete all tasks is greater in an MP environment than it would be in a UP environment--it's the fact that things can run in parallel that improves the throughput. Moral: if you want to avoid dropping packets in the situation you describe, increase the interrupt rate. The additional processing power of the system will make this practical. You and many others regulary say things like SMP is obviously faster, or Opterons are noticably faster, but those statements are only
Re: hyper threading.
[EMAIL PROTECTED] writes: When you get your machine running without a kernel let me know. The kernel is the key to the O/S. If you don't need networking and don't have many interrupts, then it probably doesnt matter that much. The kernel represents only a small part of total system utilization and throughput. Even if everything is single-threaded through the kernel, you can still get performance benefits from multiple processors, because they can run userland processes in parallel. If total system load is 5% kernel and 80% userland in a UP environment, and moving to a MP environment doubles kernel overhead, total system load has still increased by only 5%. In general, many things must be single-threaded through the kernel because of the need for proper synchronization. Thus, the kernel always shows more negative effects from MP than the system as a whole, but since it is so small in the overall picture, MP still improves global performance. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Right. Thats what I said. You'll killl your networking. So you don't want HT or SMP on a Server. Thats what most MP machines are used for. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sun, 27 Mar 2005 12:33:36 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: You can argue the technical theory all you want, but the measurements say otherwise. You have to ensure that you're doing the right measurements. FreeBSD 4.9 - Load: 38% (I put this in for fun :-) Freebsd 5.4-Pre UP (no HT) - Load: high 55-60% range FreeBSD 5.4-Pre SMP/HT - Load: 70-80% (much more jumping around) You'll find that the total CPU time required from start to finish for a single thread is ALWAYS higher for SMP than for a UP environment, even if you have separate physical processors. Several things happen when you move from a uniprocessor environment to an environment with two or more processors: - The total CPU time for each thread increases. - The total system load on a per process basis increases. - The total throughput of the system improves if there is more than one independent process running in the system. - Each of the processors runs more slowly than it would if it were the only processor running in a UP environment. If you run a single-thread benchmark on a MP system, you'll find that it runs more slowly than it does on a UP system. If you run multiple single-thread independent benchmarks on a MP system, you'll find that total CPU time for each benchmark increases over that required in a UP system--but the elapsed time required to complete all benchmarks substantially diminishes. To properly gauge the performance of a multiprocessor system, you must run a realistic mix of tasks on the system and measure overall throughput. If you do this, you'll find that you always come out ahead with multiple processors, even HT processors. Hyperthreading is just a special case of multiprocessing that imposes some additional restrictions. HT is much more sensitive to similarities in instruction mix across processes, because the actual processor hardware is being shared. With a sufficiently heterogenous instruction mix across multiple execution threads, this isn't a problem; but if you are running a single-threaded benchmark, or a series of identical single-threaded benchmarks, it can seriously distort your measurements. Although adding physical processors diminishes the performance of each processor, it still adds overall processing power, up to a certain point. The increment is never equal to the actual number of processors added, though; that is, if you go from one to two processors, you never get a doubling of effective processor power--it's more like 70-80%. The percentage increment gets worse with each additional processor, until you reach a point at which performance actually starts to decline (the point at which this happens is extremely hardware dependent, but it's always well beyond two processors). Hyperthreaded processors should not diminish in performance just because HT is turned on, because the hardware contention that diminishes performance in conventional MP systems is largely absent in a HT microprocessor. However, since you are really still only sharing a single processor with HT, the overall increment is much lower than it would be with two physical processors, and it is very sensitive to the instruction mix. this shows that you really are a bit foggy. Did you miss the part where with 2 processors you actually do have 2 processors? I actually read what Intel had to say on how the architecture works, and I spent years measuring systems the hard way (with hardware monitors and probes), so I know somewhat whereof I speak. Multiprocessing was always a significant hot-button issue with customers, as they always wanted to know how much they really gained with multiple processors (as opposed to what they had been promised). I can make an argument that networking with 1 processor on 5.4 is better than with 2. For example, with a test similar to the above, with 2 phyiscal processors FreeBSD 5.4 will start dropping packets way before it hits 500Kpps unless you increase the interrrupts/second, which of course increases the system load. And even with the dropped packets (which should reduce the load because it doesnt have to receive and transmit the packet), the load is still higher than for 4.x with a single processor. Load is not a problem, as long as it's below 100%. Since individual processors slow down in MP configurations, anything that depends on raw processor speed will suffer in an MP configuration. However, overall system throughput is greatly enhanced by running with several processors. At the same time, the total processor time required to complete all tasks is greater in an MP environment than it would be in a UP environment--it's the fact that things can run in parallel that improves the throughput. Moral: if you want to avoid dropping
Re: hyper threading.
Test it yourself. I made a comment about making sure you test before you assume that HT is helpful. I don't feel compelled to convince you. Do what you want. -Original Message- From: John Pettitt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 17:23:40 -0800 Subject: Re: hyper threading. Well you've proven than if you pick your benchmark you can get the result you want. So what that says it that the kernel network code doesn't get any benefit from HT - given that HT is supposed to benefit diverse user tasks and no multiple copies of the same code this is not big news - since you have a HT box how about running a less system code intensive and more diverse test? John [EMAIL PROTECTED] wrote: You can argue the technical theory all you want, but the measurements say otherwise. You guys have done it once again. Baited me into firing up a test that I already know the results of: Setup: Bridging em0 to em1 Load: 500Kpps, 60 bytes 3.4Ghz P4 1MB Cache FreeBSD 4.9 - Load: 38% (I put this in for fun :-) Freebsd 5.4-Pre UP (no HT) - Load: high 55-60% range FreeBSD 5.4-Pre SMP/HT - Load: 70-80% (much more jumping around) The bottom line is that if you don't test things to get real world results, you don't know crap. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. this shows that you really are a bit foggy. Did you miss the part where with 2 processors you actually do have 2 processors? I can make an argument that networking with 1 processor on 5.4 is better than with 2. For example, with a test similar to the above, with 2 phyiscal processors FreeBSD 5.4 will start dropping packets way before it hits 500Kpps unless you increase the interrrupts/second, which of course increases the system load. And even with the dropped packets (which should reduce the load because it doesnt have to receive and transmit the packet), the load is still higher than for 4.x with a single processor. You and many others regulary say things like SMP is obviously faster, or Opterons are noticably faster, but those statements are only true for certain applications. I've tested an Opteron 2.0Ghz against a 3.4Ghz P4, and the results are pretty interesting. For raw performance, ie interrupts/second handling, the P4 wins easily. The P4 wins out of the cache. But once you grow out of the cache and get more memory intensive, the Opteron beats it handily. So which is really faster? You could argue both depending on what benchmark you use. You have to test it in the environment where you plan to use it. Because the answer is almost never black and white. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 23:45:21 +0100 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: Yes, the theory is very nice; you've done a nice job reading Intel's marketing garb. I haven't read their marketing materials. I'm simply going by the technical descriptions I've read of the architecture. However if you don't have a specific hyperthreading-aware scheduler and particularly well-written, threaded applications, you'll lose more than you'll gain. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. Since FreeBSDs network stack isn't particularly well threaded, nor is the scheduler optimized for hyperthreading, you get a big mess at the kernel level. Nothing needs to be specially optimized for hyperthreading. All you need is at least two threads available for dispatch, with reasonably heterogenous instruction mixes that can use different parts of the processor hardware at the same time. Real-world instruction mixes are often in this category in general-purpose operating systems. So if you have a nice application that does a lot of threaded math operations, you might think you've achieved something, Heavily math-oriented applications (or any group of applications that contains similar instruction mixes) are among the least likely to benefit from hyperthreading, because they will tend to use the same processor logic at the same time, effectively rendering hyperthreading moot. But what you've missed is that the overhead to manage the better utilization of the dual-pipelines created by HT costs more than it gains. Unless FreeBSD is very poorly written indeed, the gain from hyperthreading should still exceed the slight increase in overhead incurred by multiprocessing logic. Hence, the loss of performance. Where can I see this loss
Re: hyper threading.
You know, you spout all of this wonderful theory without considering the quality of the implementation. Everything is implementation. And a key point that you consistently overlook is that FreeBSD 5.x is a particularly poor implementation of SMP. Linux and Dragonfly get 80% improvement in performance with a 2nd processor, and FreeBSD doesn't. Theory is meaningless if the implementation sucks, which is more than just part of the point. The concept that the kernel is poorly implemented by userland is well done is just not an assumption that you can make. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sun, 27 Mar 2005 12:33:36 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: You can argue the technical theory all you want, but the measurements say otherwise. You have to ensure that you're doing the right measurements. FreeBSD 4.9 - Load: 38% (I put this in for fun :-) Freebsd 5.4-Pre UP (no HT) - Load: high 55-60% range FreeBSD 5.4-Pre SMP/HT - Load: 70-80% (much more jumping around) You'll find that the total CPU time required from start to finish for a single thread is ALWAYS higher for SMP than for a UP environment, even if you have separate physical processors. Several things happen when you move from a uniprocessor environment to an environment with two or more processors: - The total CPU time for each thread increases. - The total system load on a per process basis increases. - The total throughput of the system improves if there is more than one independent process running in the system. - Each of the processors runs more slowly than it would if it were the only processor running in a UP environment. If you run a single-thread benchmark on a MP system, you'll find that it runs more slowly than it does on a UP system. If you run multiple single-thread independent benchmarks on a MP system, you'll find that total CPU time for each benchmark increases over that required in a UP system--but the elapsed time required to complete all benchmarks substantially diminishes. To properly gauge the performance of a multiprocessor system, you must run a realistic mix of tasks on the system and measure overall throughput. If you do this, you'll find that you always come out ahead with multiple processors, even HT processors. Hyperthreading is just a special case of multiprocessing that imposes some additional restrictions. HT is much more sensitive to similarities in instruction mix across processes, because the actual processor hardware is being shared. With a sufficiently heterogenous instruction mix across multiple execution threads, this isn't a problem; but if you are running a single-threaded benchmark, or a series of identical single-threaded benchmarks, it can seriously distort your measurements. Although adding physical processors diminishes the performance of each processor, it still adds overall processing power, up to a certain point. The increment is never equal to the actual number of processors added, though; that is, if you go from one to two processors, you never get a doubling of effective processor power--it's more like 70-80%. The percentage increment gets worse with each additional processor, until you reach a point at which performance actually starts to decline (the point at which this happens is extremely hardware dependent, but it's always well beyond two processors). Hyperthreaded processors should not diminish in performance just because HT is turned on, because the hardware contention that diminishes performance in conventional MP systems is largely absent in a HT microprocessor. However, since you are really still only sharing a single processor with HT, the overall increment is much lower than it would be with two physical processors, and it is very sensitive to the instruction mix. this shows that you really are a bit foggy. Did you miss the part where with 2 processors you actually do have 2 processors? I actually read what Intel had to say on how the architecture works, and I spent years measuring systems the hard way (with hardware monitors and probes), so I know somewhat whereof I speak. Multiprocessing was always a significant hot-button issue with customers, as they always wanted to know how much they really gained with multiple processors (as opposed to what they had been promised). I can make an argument that networking with 1 processor on 5.4 is better than with 2. For example, with a test similar to the above, with 2 phyiscal processors FreeBSD 5.4 will start dropping packets way before it hits 500Kpps unless you increase the interrrupts/second, which of course increases the system load. And even with the dropped packets (which should reduce the load because it doesnt have to receive and transmit the packet), the load is still higher than for 4.x with a single processor. Load is not a problem, as long as it's below 100%. Since individual processors slow down in MP configurations, anything
Re: hyper threading.
[EMAIL PROTECTED] writes: You know, you spout all of this wonderful theory without considering the quality of the implementation. Somethings can be derived directly from theory. If you know the design of the hardware, you can predict that two processors will provide x% increment of throughput over a single processor, even if you don't actually measure them. In my case, I cite both theory and my own experience in measuring actual systems. The general principles of behavior of multiprocessor systems are well understood, although specific implementations vary. It is clear, based even on design data alone, that hyperthreading will generally improve throughput and should never diminish it (disregarding OS overhead). It is equally clear that the gain won't be as great as having physically independent processors, but the idea of putting more of the idle processor logic to work is a good one. And a key point that you consistently overlook is that FreeBSD 5.x is a particularly poor implementation of SMP. Linux and Dragonfly get 80% improvement in performance with a 2nd processor, and FreeBSD doesn't. I'd need to see measurements to substantiate this. In general, when it comes to optimization, it's best not to fret too much over how many percentage points of processor power or throughput you gain or lose with specific configuration or implementation choices. If your system is running so close to the wire that five percent makes the difference between 100% busy and less than 100% busy, you need more hardware in any case. The concept that the kernel is poorly implemented by userland is well done is just not an assumption that you can make. Actually, it's not something that I spend a lot of time thinking about. Right now, my production system is never more than 0.4% busy. And if it were 99% busy, I'd be looking at faster hardware, no matter what OS or HT/MP options I might have implemented. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: Right. Thats what I said. You'll killl your networking. Beyond a certain network load, you have to increase the number of timer interrupts per second no matter how fast your processors are or how many of them you have, if you are polling your I/O interfaces instead of being driven from interrupts. I don't like the idea of routinely running 1000 timer interrupts per second, but I note that FreeBSD 6.x apparently is moving to this number (?). I'd prefer that it be readily configurable. There are other options but I'm not sure how well x86 hardware supports them. Having a very accurate, very high resolution elapsed-time counter on the processor(s) can help lower overhead by allowing the OS to get accurate time information without waiting for an interrupt and with execution of only a single instruction. Having programmable, very high resolution timers would help, too. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
On Saturday 26 March 2005 22:45, Anthony Atkielski wrote: [EMAIL PROTECTED] writes: Yes, the theory is very nice; you've done a nice job reading Intel's marketing garb. I haven't read their marketing materials. I'm simply going by the technical descriptions I've read of the architecture. However if you don't have a specific hyperthreading-aware scheduler and particularly well-written, threaded applications, you'll lose more than you'll gain. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. The situation is very different. Multiple processors can run multiple processes at the same time. A HT processor can only run two threads from the same process. And most software isn't multithreaded. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
RW writes: Multiple processors can run multiple processes at the same time. A HT processor can only run two threads from the same process. This is incorrect. HT processors don't care where the threads come from; it is possible to run threads from two completely different processes on the same HT processor. The threads have completely independent architectural states and can come from anywhere in the system. However, the design of hyperthreading favors a certain amount of commonality between the contexts of each thread. While it's best to have different instruction mixes, it helps if both threads are executing in the same memory spaces, since resources such as on-device cache are shared between the threads in the HT processor. Completely different threads in different processes executing out of different areas of memory might cause more contention for cache and similar resources (TLB, etc.), diminishing the advantage of hyperthreading. Also, spin waits need special consideration on HT processors. If one thread spins on a gate or semaphore held by the other thread, it effectively slows the other thread down, keeping both threads moving more slowly than they might if they were in completely separate processors. A solution for this suggested by Intel is the PAUSE instruction, which forces complete execution of a spin-wait instruction before the next execution can begin, thus freeing resources for the other thread in the processor. Intel recommends the use of PAUSE even when HT processors are not being used. Still another recommendation is to schedule first physically independent processors, then HT logical processors. This requires that the OS be aware of the difference between the two. This usually makes more efficient use of processor resources, except for some very specific cases where running two threads on the same HT processor might run as fast or faster than running them separately (if they are referencing a lot of the same shared resources, such as cache). Intel claims up to a 30% improvement in throughput for an HT processor as compared to a normal processor. For truly separate physical processors, the improvement is more like 60%, and possibly much more. Hyperthreading should not be seen as a substitute for multiple processors. It's more like a way to make better use of each processor. Hyperthreading is especially useful when multiple execution threads exist in a common context, such as multithreaded daemons or multithreaded desktop applications. In these situations, the HT architecture is used to the fullest, with corresponding improvements in performance. Recent changes in FreeBSD architecture to allow a multithreaded kernel are among the situations in which hyperthreading can be put to good use. The new multithreaded architecture of Apache 2.x should also be able to put HT to good use. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
On Sunday 27 March 2005 22:33, Anthony Atkielski wrote: RW writes: Multiple processors can run multiple processes at the same time. A HT processor can only run two threads from the same process. This is incorrect. HT processors don't care where the threads come from; it is possible to run threads from two completely different processes on the same HT processor. But what would be the point, that's slower than running with HT turned-off. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
RW writes: But what would be the point, that's slower than running with HT turned-off. Not necessarily. It depends on a lot of things. It any case, nobody is forced to run with HT and SMP enabled. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Polling is simply unecessary in most cases. You could get better performance using an em driver and setting max ints to whatever is optimal for your system. Polling adds latency and over head for no good reason. As I've said before, the FreeBSD team is patently clueless. They're grasping at straws. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sun, 27 Mar 2005 20:04:16 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: Right. Thats what I said. You'll killl your networking. Beyond a certain network load, you have to increase the number of timer interrupts per second no matter how fast your processors are or how many of them you have, if you are polling your I/O interfaces instead of being driven from interrupts. I don't like the idea of routinely running 1000 timer interrupts per second, but I note that FreeBSD 6.x apparently is moving to this number (?). I'd prefer that it be readily configurable. There are other options but I'm not sure how well x86 hardware supports them. Having a very accurate, very high resolution elapsed-time counter on the processor(s) can help lower overhead by allowing the OS to get accurate time information without waiting for an interrupt and with execution of only a single instruction. Having programmable, very high resolution timers would help, too. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
I've never seen any measurements. And most of your theories are clearly incorrect for FreeBSD. So what good is it? You claim to have done measurements, so what do you have to refute it? Being a fool is a choice. Its easily turned. The problem is when you can't get more hardware. When you are pushing the envelope, then you run out of choices. There is also a price/performance consideration. You make a choice to spend an extra 30% for certain hardware. But if you can get the same performance using lesser hardware with different settings or a different version of the OS, then you are wasting your money. If you don't need much, or you are spending someone else's money, then everything is moot. Just use whats cool. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sun, 27 Mar 2005 20:01:57 +0200 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: You know, you spout all of this wonderful theory without considering the quality of the implementation. Somethings can be derived directly from theory. If you know the design of the hardware, you can predict that two processors will provide x% increment of throughput over a single processor, even if you don't actually measure them. In my case, I cite both theory and my own experience in measuring actual systems. The general principles of behavior of multiprocessor systems are well understood, although specific implementations vary. It is clear, based even on design data alone, that hyperthreading will generally improve throughput and should never diminish it (disregarding OS overhead). It is equally clear that the gain won't be as great as having physically independent processors, but the idea of putting more of the idle processor logic to work is a good one. And a key point that you consistently overlook is that FreeBSD 5.x is a particularly poor implementation of SMP. Linux and Dragonfly get 80% improvement in performance with a 2nd processor, and FreeBSD doesn't. I'd need to see measurements to substantiate this. In general, when it comes to optimization, it's best not to fret too much over how many percentage points of processor power or throughput you gain or lose with specific configuration or implementation choices. If your system is running so close to the wire that five percent makes the difference between 100% busy and less than 100% busy, you need more hardware in any case. The concept that the kernel is poorly implemented by userland is well done is just not an assumption that you can make. Actually, it's not something that I spend a lot of time thinking about. Right now, my production system is never more than 0.4% busy. And if it were 99% busy, I'd be looking at faster hardware, no matter what OS or HT/MP options I might have implemented. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: Polling is simply unecessary in most cases. You could get better performance using an em driver and setting max ints to whatever is optimal for your system. Polling adds latency and over head for no good reason. Polling often provides better performance, at the expense of higher overhead. -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Perttu Laine writes: I have 3,4ghz ht processor and freebsd shows up only one processors. I suppose it should show two in ht models? so, GENERIC kernel doesn't support it? but should I add to kernel config to enable it? by reading config examples I think this should be enough: options SMP Yes, that's all you need. Just add that line, rebuild and reinstall the kernel, and you're all set. Works great. Hyperthreading doesn't buy you as much as truly separate processors, but it helps you get more bang for the buck out of your single processor (depending on the type of workload you run). -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
This is the kind of disinformation I have been referring to You'll get much better performance with 1 processor in UP mode. I suggest you do some testing. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 19:28:11 +0100 Subject: Re: hyper threading. Perttu Laine writes: I have 3,4ghz ht processor and freebsd shows up only one processors. I suppose it should show two in ht models? so, GENERIC kernel doesn't support it? but should I add to kernel config to enable it? by reading config examples I think this should be enough: options SMP Yes, that's all you need. Just add that line, rebuild and reinstall the kernel, and you're all set. Works great. Hyperthreading doesn't buy you as much as truly separate processors, but it helps you get more bang for the buck out of your single processor (depending on the type of workload you run). ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] wrote: This is the kind of disinformation I have been referring to You'll get much better performance with 1 processor in UP mode. I suggest you do some testing. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 19:28:11 +0100 Subject: Re: hyper threading. Perttu Laine writes: I have 3,4ghz ht processor and freebsd shows up only one processors. I suppose it should show two in ht models? so, GENERIC kernel doesn't support it? but should I add to kernel config to enable it? by reading config examples I think this should be enough: options SMP Yes, that's all you need. Just add that line, rebuild and reinstall the kernel, and you're all set. Works great. Hyperthreading doesn't buy you as much as truly separate processors, but it helps you get more bang for the buck out of your single processor (depending on the type of workload you run). If you feel someone is in error - feel free to jump in and offer what you feel to be correct information. Sometimes sitting back and not correcting someone is far worse then someone offering information based on what they know, experience, or what have you. In this case, by NOT offering the correct information, YOU are just as much to blame for what you say is going on. For those of us that don't answer, we either don't know (as is the case wit myself) OR, they have not had a chance to read the thread. -- Best regards, Chris It is a simple task to make things complex, but a complex task to make them simple. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
I am offerring the correct information. Turning on SMP on an HT machine will kill the systems performance much more than hyperthreading will gain. I told him to test. The degradation is easily measurable. -Original Message- From: Chris [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 13:49:53 -0600 Subject: Re: hyper threading. [EMAIL PROTECTED] wrote: This is the kind of disinformation I have been referring to You'll get much better performance with 1 processor in UP mode. I suggest you do some testing. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 19:28:11 +0100 Subject: Re: hyper threading. Perttu Laine writes: I have 3,4ghz ht processor and freebsd shows up only one processors. I suppose it should show two in ht models? so, GENERIC kernel doesn't support it? but should I add to kernel config to enable it? by reading config examples I think this should be enough: options SMP Yes, that's all you need. Just add that line, rebuild and reinstall the kernel, and you're all set. Works great. Hyperthreading doesn't buy you as much as truly separate processors, but it helps you get more bang for the buck out of your single processor (depending on the type of workload you run). If you feel someone is in error - feel free to jump in and offer what you feel to be correct information. Sometimes sitting back and not correcting someone is far worse then someone offering information based on what they know, experience, or what have you. In this case, by NOT offering the correct information, YOU are just as much to blame for what you say is going on. For those of us that don't answer, we either don't know (as is the case wit myself) OR, they have not had a chance to read the thread. -- Best regards, Chris It is a simple task to make things complex, but a complex task to make them simple. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: You'll get much better performance with 1 processor in UP mode. I suggest you do some testing. Where can I see the results of your own exhaustive tests? The purpose of hyperthreading is to keep all hardware on the microprocessor working. Many instructions use only certain parts of the chip, leaving other parts idle. By allowing two execution contexts to be maintained simultaneously, hyperthreading makes it possible to better utilize hardware that might otherwise sit idle. The ideal case would be two threads executing completely different instruction sequences that use very different parts of this chip. I don't have exact figures but I'd guess that in ideal situations you might get 20%-30% extra out of a single processor in this way--enough to negate the greater overhead of the SMP logic. A situation in which hyperthreading would _not_ help would be any type of parallel processing, in which multiple threads execute very similar instructions. These instructions are likely to require the same parts of the microprocessor at the same time, so it's unlikely that they will be able to execute in parallel--one will have to wait for the other (because the microprocessor has logic areas that can function independently and simultaneously, but these areas don't do the same things, so they are not redundant logic). -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: I am offerring the correct information. Turning on SMP on an HT machine will kill the systems performance much more than hyperthreading will gain. Why? I've explained why hyperthreading can provide a modest gain in performance. Now explain to me why it would not. I told him to test. The degradation is easily measurable. If you can say with certainty that a degradation occurs, then you've already tested, in which case you can show your work. If you haven't tested, then you can't say anything with certainty, in which case your opinions are pure conjecture. A quick look at actual research done by various parties on the Web reveals that HT does provide the modest improvements to which I've alluded. It's not as impressive as two processors, but then again, nobody claimed it would be. It just makes better use of one processor and allows you to get more for your money from that processor. One advantage that I had not previous mentioned is that the availability of a logical processor for dispatch can improve response time in certain scenarios, even if the overall processor power doesn't increase that much. When compute-bound processes monopolize a single processor, the response time of the entire system can suffer; but if you have a second processor waiting for dispatch (even a logical HT processor), you can immediately attend to other tasks even as the compute-bound process runs, as long as it isn't launching multiple threads (which most such processes won't do). -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Yes, the theory is very nice; you've done a nice job reading Intel's marketing garb. However if you don't have a specific hyperthreading-aware scheduler and particularly well-written, threaded applications, you'll lose more than you'll gain. Since FreeBSDs network stack isn't particularly well threaded, nor is the scheduler optimized for hyperthreading, you get a big mess at the kernel level. So if you have a nice application that does a lot of threaded math operations, you might think you've achieved something, But what you've missed is that the overhead to manage the better utilization of the dual-pipelines created by HT costs more than it gains. Hence, the loss of performance. The poblem is not at the application level, but at the kernel level. The SMP overhead is so substantial, and the OS is working thinking it has 2 processors, that process switching and interrupt handling slow down considerably. A machine with a 50% load UP will run 65-70% load with HT/SMP running. Like I said, its easily measurable. Thats at the kernel level (say routing or bridging performance). Now if the machine isn't a server, it may be just fine. Thats why I suggested testing. But for a network server HT is bad. Very Bad. Not only that, but FreeBSD 5.x actually has a higher capacity network-wise with 1 processor than 2, and I'm sure you can theorize why 2 processors should be faster than one. The theory only matters if you have well written code to handle it properly. FreeBSD is a long way off from that. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 22:06:38 +0100 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: You'll get much better performance with 1 processor in UP mode. I suggest you do some testing. Where can I see the results of your own exhaustive tests? The purpose of hyperthreading is to keep all hardware on the microprocessor working. Many instructions use only certain parts of the chip, leaving other parts idle. By allowing two execution contexts to be maintained simultaneously, hyperthreading makes it possible to better utilize hardware that might otherwise sit idle. The ideal case would be two threads executing completely different instruction sequences that use very different parts of this chip. I don't have exact figures but I'd guess that in ideal situations you might get 20%-30% extra out of a single processor in this way--enough to negate the greater overhead of the SMP logic. A situation in which hyperthreading would _not_ help would be any type of parallel processing, in which multiple threads execute very similar instructions. These instructions are likely to require the same parts of the microprocessor at the same time, so it's unlikely that they will be able to execute in parallel--one will have to wait for the other (because the microprocessor has logic areas that can function independently and simultaneously, but these areas don't do the same things, so they are not redundant logic). -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
[EMAIL PROTECTED] writes: Yes, the theory is very nice; you've done a nice job reading Intel's marketing garb. I haven't read their marketing materials. I'm simply going by the technical descriptions I've read of the architecture. However if you don't have a specific hyperthreading-aware scheduler and particularly well-written, threaded applications, you'll lose more than you'll gain. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. Since FreeBSDs network stack isn't particularly well threaded, nor is the scheduler optimized for hyperthreading, you get a big mess at the kernel level. Nothing needs to be specially optimized for hyperthreading. All you need is at least two threads available for dispatch, with reasonably heterogenous instruction mixes that can use different parts of the processor hardware at the same time. Real-world instruction mixes are often in this category in general-purpose operating systems. So if you have a nice application that does a lot of threaded math operations, you might think you've achieved something, Heavily math-oriented applications (or any group of applications that contains similar instruction mixes) are among the least likely to benefit from hyperthreading, because they will tend to use the same processor logic at the same time, effectively rendering hyperthreading moot. But what you've missed is that the overhead to manage the better utilization of the dual-pipelines created by HT costs more than it gains. Unless FreeBSD is very poorly written indeed, the gain from hyperthreading should still exceed the slight increase in overhead incurred by multiprocessing logic. Hence, the loss of performance. Where can I see this loss of performance documented? The poblem is not at the application level, but at the kernel level. The SMP overhead is so substantial, and the OS is working thinking it has 2 processors, that process switching and interrupt handling slow down considerably. How much is so substantial? Where can I see this documented? A machine with a 50% load UP will run 65-70% load with HT/SMP running. Like I said, its easily measurable. Then you can show me the measurements. Where are they? A 40% increase in system load just because of multiprocessing is enormous. Where did you get this figure? Thats at the kernel level (say routing or bridging performance). But the kernel is only a small fraction of overall processor utilization. Now if the machine isn't a server, it may be just fine. Thats why I suggested testing. But for a network server HT is bad. Very Bad. It doesn't matter whether the machine is a server or a desktop. What matters is the specific mix and nature of applications. Not only that, but FreeBSD 5.x actually has a higher capacity network-wise with 1 processor than 2 ... Here again, I need to see this documented. ... and I'm sure you can theorize why 2 processors should be faster than one. The theory only matters if you have well written code to handle it properly. FreeBSD is a long way off from that. Where can I see the measurements? -- Anthony ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
On Sat, Mar 26, 2005 at 11:45:21PM +0100, Anthony Atkielski wrote: Where can I see the measurements? Here are some measurements. A few weeks ago I ran Unixbench 4.1.0 (/usr/ports/benchmarks/unixbench) on a P4 2.8GHz with and without hyperthreading enabled. I note a slight difference in the 10 minute load average in favour of the uniprocessor run (0.00 vs 0.10 in the hyperthreading run), though I doubt this alone could account for a 15% difference in total score. Uniprocessor run: - BYTE UNIX Benchmarks (Version 4.1.0) System -- bigbird.logicsquad.net Start Benchmark Run: Sun Feb 20 08:23:08 CST 2005 14 interactive users. 8:23AM up 3 days, 14:37, 14 users, load averages: 0.00, 0.00, 0.00 -r-xr-xr-x 1 root wheel 105624 Feb 12 00:09 /bin/sh /bin/sh: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), for FreeBSD 5.3-CURRENT (rev 1), dynamically linked (uses shared libs), stripped /dev/mirror/gm0s1f 164607432 5190146 146248692 3%/usr Dhrystone 2 using register variables 4438000.0 lps (10.0 secs, 10 samples) Double-Precision Whetstone 786.2 MWIPS (10.4 secs, 10 samples) System Call Overhead 387391.7 lps (10.0 secs, 10 samples) Pipe Throughput 595757.1 lps (10.0 secs, 10 samples) Pipe-based Context Switching 94343.7 lps (10.0 secs, 10 samples) Process Creation 5143.3 lps (30.0 secs, 3 samples) Execl Throughput 1127.4 lps (29.9 secs, 3 samples) File Read 1024 bufsize 2000 maxblocks637932.0 KBps (30.0 secs, 3 samples) File Write 1024 bufsize 2000 maxblocks86241.0 KBps (30.0 secs, 3 samples) File Copy 1024 bufsize 2000 maxblocks 84790.0 KBps (30.0 secs, 3 samples) File Read 256 bufsize 500 maxblocks 182188.0 KBps (30.0 secs, 3 samples) File Write 256 bufsize 500 maxblocks 83127.0 KBps (30.0 secs, 3 samples) File Copy 256 bufsize 500 maxblocks 53860.0 KBps (30.0 secs, 3 samples) File Read 4096 bufsize 8000 maxblocks1662218.0 KBps (30.0 secs, 3 samples) File Write 4096 bufsize 8000 maxblocks47821.0 KBps (30.0 secs, 3 samples) File Copy 4096 bufsize 8000 maxblocks 47003.0 KBps (30.0 secs, 3 samples) Shell Scripts (1 concurrent) 2584.9 lpm (60.0 secs, 3 samples) Shell Scripts (8 concurrent)353.3 lpm (60.0 secs, 3 samples) Shell Scripts (16 concurrent) 177.0 lpm (60.0 secs, 3 samples) Arithmetic Test (type = short) 687842.3 lps (10.0 secs, 3 samples) Arithmetic Test (type = int) 697114.1 lps (10.0 secs, 3 samples) Arithmetic Test (type = long)697313.5 lps (10.0 secs, 3 samples) Arithmetic Test (type = float) 658678.8 lps (10.0 secs, 3 samples) Arithmetic Test (type = double) 658663.3 lps (10.0 secs, 3 samples) Arithoh 14359071.4 lps (10.0 secs, 3 samples) C Compiler Throughput 1373.3 lpm (60.0 secs, 3 samples) Dc: sqrt(2) to 99 decimal places 161336.3 lpm (30.0 secs, 3 samples) Recursion Test--Tower of Hanoi98086.8 lps (20.0 secs, 3 samples) INDEX VALUES TESTBASELINE RESULT INDEX Dhrystone 2 using register variables116700.0 4438000.0 380.3 Double-Precision Whetstone 55.0 786.2 142.9 Execl Throughput43.0 1127.4 262.2 File Copy 1024 bufsize 2000 maxblocks 3960.084790.0 214.1 File Copy 256 bufsize 500 maxblocks 1655.053860.0 325.4 File Copy 4096 bufsize 8000 maxblocks 5800.047003.0 81.0 Pipe Throughput 12440.0 595757.1 478.9 Pipe-based Context Switching 4000.094343.7 235.9 Process Creation 126.0 5143.3 408.2 Shell Scripts (8 concurrent) 6.0 353.3 588.8 System Call Overhead 15000.0 387391.7 258.3 = FINAL SCORE 270.4 Hyperthreading run: --- BYTE UNIX Benchmarks (Version 4.1.0) System -- bigbird.logicsquad.net Start Benchmark Run: Sun Feb 20 17:22:33 CST 2005 2 interactive users. 5:22PM up 2 mins, 2 users, load averages: 0.31, 0.23, 0.10 -r-xr-xr-x 1 root wheel 105624 Feb 12 00:09 /bin/sh /bin/sh: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), for FreeBSD 5.3-CURRENT (rev 1), dynamically linked (uses shared libs), stripped /dev/mirror/gm0s1f 164607432 5264584 146174254 3%/usr Dhrystone 2 using register variables 4463262.0 lps (10.0 secs, 10 samples) Double-Precision Whetstone 785.8 MWIPS
Re: hyper threading.
Paul A. Hoadley wrote: On Sat, Mar 26, 2005 at 11:45:21PM +0100, Anthony Atkielski wrote: Where can I see the measurements? Here are some measurements. A few weeks ago I ran Unixbench 4.1.0 (/usr/ports/benchmarks/unixbench) on a P4 2.8GHz with and without hyperthreading enabled. I note a slight difference in the 10 minute load average in favour of the uniprocessor run (0.00 vs 0.10 in the hyperthreading run), though I doubt this alone could account for a 15% difference in total score. Uniprocessor run: - BYTE UNIX Benchmarks (Version 4.1.0) System -- bigbird.logicsquad.net Start Benchmark Run: Sun Feb 20 08:23:08 CST 2005 14 interactive users. 8:23AM up 3 days, 14:37, 14 users, load averages: 0.00, 0.00, 0.00 [snip] = FINAL SCORE 270.4 Hyperthreading run: --- BYTE UNIX Benchmarks (Version 4.1.0) System -- bigbird.logicsquad.net Start Benchmark Run: Sun Feb 20 17:22:33 CST 2005 2 interactive users. 5:22PM up 2 mins, 2 users, load averages: 0.31, 0.23, 0.10 [snip] = FINAL SCORE 228.9 Notice the HT run had load on the box (0.31) when it started. If you're going to run benchmarks you need to start with a clean reboot before each run and make sure all the background daemons have been killed and and the load is zero. However even then this is not a good test of HT - the point of HT is to improve throughput in multi thread workloads and the benchmark suite is basically single thread.What would be more interesting would be to run a test with a constant background load also running.In theory the HT should do a better job of balancing the load between the benchmark and the background than the BSD scheduler can on it's own. I don't have an HT box here or I'd try it but I'd love to know how it comes out if somebody is up for it. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Hello, However even then this is not a good test of HT - the point of HT is to improve throughput in multi thread workloads and the benchmark suite is basically single thread.What would be more interesting would be to run a test with a constant background load also running.In theory the HT should do a better job of balancing the load between the benchmark and the background than the BSD scheduler can on it's own. I don't have an HT box here or I'd try it but I'd love to know how it comes out if somebody is up for it. It would be interesting to see the results of the BSD ULE scheduler on 5.4 Pre and 6 compared to 5.3R. --Nick --Nick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
Hello, On Sat, Mar 26, 2005 at 03:54:06PM -0800, John Pettitt wrote: Paul A. Hoadley wrote: I note a slight difference in the 10 minute load average in favour of the uniprocessor run (0.00 vs 0.10 in the hyperthreading run), though I doubt this alone could account for a 15% difference in total score. Notice the HT run had load on the box (0.31) when it started. If you're going to run benchmarks you need to start with a clean reboot before each run and make sure all the background daemons have been killed and and the load is zero. You are absolutely right, and I did note the difference in load averages. I'm not making any claims---someone asked for measurements, and I happened to have these handy. -- Paul. w http://logicsquad.net/ h http://paul.hoadley.name/ pgpvaZGNUOBes.pgp Description: PGP signature
Re: hyper threading.
Uh, thats not the correct load average to use. Use the numbers obtained from top or systat. Those loads will show Zero load when you're routing 100K pps. It doesnt measure kernel load. -Original Message- From: Paul A. Hoadley [EMAIL PROTECTED] To: John Pettitt [EMAIL PROTECTED] Cc: freebsd-questions@freebsd.org Sent: Sun, 27 Mar 2005 09:53:25 +0930 Subject: Re: hyper threading. Hello, On Sat, Mar 26, 2005 at 03:54:06PM -0800, John Pettitt wrote: Paul A. Hoadley wrote: I note a slight difference in the 10 minute load average in favour of the uniprocessor run (0.00 vs 0.10 in the hyperthreading run), though I doubt this alone could account for a 15% difference in total score. Notice the HT run had load on the box (0.31) when it started. If you're going to run benchmarks you need to start with a clean reboot before each run and make sure all the background daemons have been killed and and the load is zero. You are absolutely right, and I did note the difference in load averages. I'm not making any claims---someone asked for measurements, and I happened to have these handy. -- Paul. w http://logicsquad.net/ h http://paul.hoadley.name/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: hyper threading.
You can argue the technical theory all you want, but the measurements say otherwise. You guys have done it once again. Baited me into firing up a test that I already know the results of: Setup: Bridging em0 to em1 Load: 500Kpps, 60 bytes 3.4Ghz P4 1MB Cache FreeBSD 4.9 - Load: 38% (I put this in for fun :-) Freebsd 5.4-Pre UP (no HT) - Load: high 55-60% range FreeBSD 5.4-Pre SMP/HT - Load: 70-80% (much more jumping around) The bottom line is that if you don't test things to get real world results, you don't know crap. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. this shows that you really are a bit foggy. Did you miss the part where with 2 processors you actually do have 2 processors? I can make an argument that networking with 1 processor on 5.4 is better than with 2. For example, with a test similar to the above, with 2 phyiscal processors FreeBSD 5.4 will start dropping packets way before it hits 500Kpps unless you increase the interrrupts/second, which of course increases the system load. And even with the dropped packets (which should reduce the load because it doesnt have to receive and transmit the packet), the load is still higher than for 4.x with a single processor. You and many others regulary say things like SMP is obviously faster, or Opterons are noticably faster, but those statements are only true for certain applications. I've tested an Opteron 2.0Ghz against a 3.4Ghz P4, and the results are pretty interesting. For raw performance, ie interrupts/second handling, the P4 wins easily. The P4 wins out of the cache. But once you grow out of the cache and get more memory intensive, the Opteron beats it handily. So which is really faster? You could argue both depending on what benchmark you use. You have to test it in the environment where you plan to use it. Because the answer is almost never black and white. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 23:45:21 +0100 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: Yes, the theory is very nice; you've done a nice job reading Intel's marketing garb. I haven't read their marketing materials. I'm simply going by the technical descriptions I've read of the architecture. However if you don't have a specific hyperthreading-aware scheduler and particularly well-written, threaded applications, you'll lose more than you'll gain. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. Since FreeBSDs network stack isn't particularly well threaded, nor is the scheduler optimized for hyperthreading, you get a big mess at the kernel level. Nothing needs to be specially optimized for hyperthreading. All you need is at least two threads available for dispatch, with reasonably heterogenous instruction mixes that can use different parts of the processor hardware at the same time. Real-world instruction mixes are often in this category in general-purpose operating systems. So if you have a nice application that does a lot of threaded math operations, you might think you've achieved something, Heavily math-oriented applications (or any group of applications that contains similar instruction mixes) are among the least likely to benefit from hyperthreading, because they will tend to use the same processor logic at the same time, effectively rendering hyperthreading moot. But what you've missed is that the overhead to manage the better utilization of the dual-pipelines created by HT costs more than it gains. Unless FreeBSD is very poorly written indeed, the gain from hyperthreading should still exceed the slight increase in overhead incurred by multiprocessing logic. Hence, the loss of performance. Where can I see this loss of performance documented? The poblem is not at the application level, but at the kernel level. The SMP overhead is so substantial, and the OS is working thinking it has 2 processors, that process switching and interrupt handling slow down considerably. How much is so substantial? Where can I see this documented? A machine with a 50% load UP will run 65-70% load with HT/SMP running. Like I said, its easily measurable. Then you can show me the measurements. Where are they? A 40% increase in system load just because of multiprocessing is enormous. Where did you get this figure? Thats at the kernel level (say routing or bridging performance). But the kernel is only a small fraction of overall processor utilization. Now if the machine isn't a server, it may be just
Re: hyper threading.
Well you've proven than if you pick your benchmark you can get the result you want. So what that says it that the kernel network code doesn't get any benefit from HT - given that HT is supposed to benefit diverse user tasks and no multiple copies of the same code this is not big news - since you have a HT box how about running a less system code intensive and more diverse test? John [EMAIL PROTECTED] wrote: You can argue the technical theory all you want, but the measurements say otherwise. You guys have done it once again. Baited me into firing up a test that I already know the results of: Setup: Bridging em0 to em1 Load: 500Kpps, 60 bytes 3.4Ghz P4 1MB Cache FreeBSD 4.9 - Load: 38% (I put this in for fun :-) Freebsd 5.4-Pre UP (no HT) - Load: high 55-60% range FreeBSD 5.4-Pre SMP/HT - Load: 70-80% (much more jumping around) The bottom line is that if you don't test things to get real world results, you don't know crap. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. this shows that you really are a bit foggy. Did you miss the part where with 2 processors you actually do have 2 processors? I can make an argument that networking with 1 processor on 5.4 is better than with 2. For example, with a test similar to the above, with 2 phyiscal processors FreeBSD 5.4 will start dropping packets way before it hits 500Kpps unless you increase the interrrupts/second, which of course increases the system load. And even with the dropped packets (which should reduce the load because it doesnt have to receive and transmit the packet), the load is still higher than for 4.x with a single processor. You and many others regulary say things like SMP is obviously faster, or Opterons are noticably faster, but those statements are only true for certain applications. I've tested an Opteron 2.0Ghz against a 3.4Ghz P4, and the results are pretty interesting. For raw performance, ie interrupts/second handling, the P4 wins easily. The P4 wins out of the cache. But once you grow out of the cache and get more memory intensive, the Opteron beats it handily. So which is really faster? You could argue both depending on what benchmark you use. You have to test it in the environment where you plan to use it. Because the answer is almost never black and white. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 23:45:21 +0100 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: Yes, the theory is very nice; you've done a nice job reading Intel's marketing garb. I haven't read their marketing materials. I'm simply going by the technical descriptions I've read of the architecture. However if you don't have a specific hyperthreading-aware scheduler and particularly well-written, threaded applications, you'll lose more than you'll gain. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. Since FreeBSDs network stack isn't particularly well threaded, nor is the scheduler optimized for hyperthreading, you get a big mess at the kernel level. Nothing needs to be specially optimized for hyperthreading. All you need is at least two threads available for dispatch, with reasonably heterogenous instruction mixes that can use different parts of the processor hardware at the same time. Real-world instruction mixes are often in this category in general-purpose operating systems. So if you have a nice application that does a lot of threaded math operations, you might think you've achieved something, Heavily math-oriented applications (or any group of applications that contains similar instruction mixes) are among the least likely to benefit from hyperthreading, because they will tend to use the same processor logic at the same time, effectively rendering hyperthreading moot. But what you've missed is that the overhead to manage the better utilization of the dual-pipelines created by HT costs more than it gains. Unless FreeBSD is very poorly written indeed, the gain from hyperthreading should still exceed the slight increase in overhead incurred by multiprocessing logic. Hence, the loss of performance. Where can I see this loss of performance documented? The poblem is not at the application level, but at the kernel level. The SMP overhead is so substantial, and the OS is working thinking it has 2 processors, that process switching and interrupt handling slow down
Re: hyper threading.
When you get your machine running without a kernel let me know. The kernel is the key to the O/S. If you don't need networking and don't have many interrupts, then it probably doesnt matter that much. -Original Message- From: John Pettitt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 17:23:40 -0800 Subject: Re: hyper threading. Well you've proven than if you pick your benchmark you can get the result you want. So what that says it that the kernel network code doesn't get any benefit from HT - given that HT is supposed to benefit diverse user tasks and no multiple copies of the same code this is not big news - since you have a HT box how about running a less system code intensive and more diverse test? John [EMAIL PROTECTED] wrote: You can argue the technical theory all you want, but the measurements say otherwise. You guys have done it once again. Baited me into firing up a test that I already know the results of: Setup: Bridging em0 to em1 Load: 500Kpps, 60 bytes 3.4Ghz P4 1MB Cache FreeBSD 4.9 - Load: 38% (I put this in for fun :-) Freebsd 5.4-Pre UP (no HT) - Load: high 55-60% range FreeBSD 5.4-Pre SMP/HT - Load: 70-80% (much more jumping around) The bottom line is that if you don't test things to get real world results, you don't know crap. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. this shows that you really are a bit foggy. Did you miss the part where with 2 processors you actually do have 2 processors? I can make an argument that networking with 1 processor on 5.4 is better than with 2. For example, with a test similar to the above, with 2 phyiscal processors FreeBSD 5.4 will start dropping packets way before it hits 500Kpps unless you increase the interrrupts/second, which of course increases the system load. And even with the dropped packets (which should reduce the load because it doesnt have to receive and transmit the packet), the load is still higher than for 4.x with a single processor. You and many others regulary say things like SMP is obviously faster, or Opterons are noticably faster, but those statements are only true for certain applications. I've tested an Opteron 2.0Ghz against a 3.4Ghz P4, and the results are pretty interesting. For raw performance, ie interrupts/second handling, the P4 wins easily. The P4 wins out of the cache. But once you grow out of the cache and get more memory intensive, the Opteron beats it handily. So which is really faster? You could argue both depending on what benchmark you use. You have to test it in the environment where you plan to use it. Because the answer is almost never black and white. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 23:45:21 +0100 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: Yes, the theory is very nice; you've done a nice job reading Intel's marketing garb. I haven't read their marketing materials. I'm simply going by the technical descriptions I've read of the architecture. However if you don't have a specific hyperthreading-aware scheduler and particularly well-written, threaded applications, you'll lose more than you'll gain. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. Since FreeBSDs network stack isn't particularly well threaded, nor is the scheduler optimized for hyperthreading, you get a big mess at the kernel level. Nothing needs to be specially optimized for hyperthreading. All you need is at least two threads available for dispatch, with reasonably heterogenous instruction mixes that can use different parts of the processor hardware at the same time. Real-world instruction mixes are often in this category in general-purpose operating systems. So if you have a nice application that does a lot of threaded math operations, you might think you've achieved something, Heavily math-oriented applications (or any group of applications that contains similar instruction mixes) are among the least likely to benefit from hyperthreading, because they will tend to use the same processor logic at the same time, effectively rendering hyperthreading moot. But what you've missed is that the overhead to manage the better utilization of the dual-pipelines created by HT costs more than it gains. Unless FreeBSD is very poorly written indeed, the gain from hyperthreading should still exceed the slight increase in overhead incurred by multiprocessing logic. Hence, the loss
Re: hyper threading.
Hmm on my boxes the combined sys and intr cpu rarely goes over 20% - most of the load is user space. I'd venture that most people running user space appllications will see similar numbers. I agree tat a box running as a router is not a good candidate for HT - that wasn't the question. John [EMAIL PROTECTED] wrote: When you get your machine running without a kernel let me know. The kernel is the key to the O/S. If you don't need networking and don't have many interrupts, then it probably doesnt matter that much. -Original Message- From: John Pettitt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 17:23:40 -0800 Subject: Re: hyper threading. Well you've proven than if you pick your benchmark you can get the result you want. So what that says it that the kernel network code doesn't get any benefit from HT - given that HT is supposed to benefit diverse user tasks and no multiple copies of the same code this is not big news - since you have a HT box how about running a less system code intensive and more diverse test? John [EMAIL PROTECTED] wrote: You can argue the technical theory all you want, but the measurements say otherwise. You guys have done it once again. Baited me into firing up a test that I already know the results of: Setup: Bridging em0 to em1 Load: 500Kpps, 60 bytes 3.4Ghz P4 1MB Cache FreeBSD 4.9 - Load: 38% (I put this in for fun :-) Freebsd 5.4-Pre UP (no HT) - Load: high 55-60% range FreeBSD 5.4-Pre SMP/HT - Load: 70-80% (much more jumping around) The bottom line is that if you don't test things to get real world results, you don't know crap. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. this shows that you really are a bit foggy. Did you miss the part where with 2 processors you actually do have 2 processors? I can make an argument that networking with 1 processor on 5.4 is better than with 2. For example, with a test similar to the above, with 2 phyiscal processors FreeBSD 5.4 will start dropping packets way before it hits 500Kpps unless you increase the interrrupts/second, which of course increases the system load. And even with the dropped packets (which should reduce the load because it doesnt have to receive and transmit the packet), the load is still higher than for 4.x with a single processor. You and many others regulary say things like SMP is obviously faster, or Opterons are noticably faster, but those statements are only true for certain applications. I've tested an Opteron 2.0Ghz against a 3.4Ghz P4, and the results are pretty interesting. For raw performance, ie interrupts/second handling, the P4 wins easily. The P4 wins out of the cache. But once you grow out of the cache and get more memory intensive, the Opteron beats it handily. So which is really faster? You could argue both depending on what benchmark you use. You have to test it in the environment where you plan to use it. Because the answer is almost never black and white. -Original Message- From: Anthony Atkielski [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Sent: Sat, 26 Mar 2005 23:45:21 +0100 Subject: Re: hyper threading. [EMAIL PROTECTED] writes: Yes, the theory is very nice; you've done a nice job reading Intel's marketing garb. I haven't read their marketing materials. I'm simply going by the technical descriptions I've read of the architecture. However if you don't have a specific hyperthreading-aware scheduler and particularly well-written, threaded applications, you'll lose more than you'll gain. If that were true, then it would be equally true of systems with actual multiple physical processors. In practice, multiple processors provide an obvious performance gain, and hyperthreading does, too, although it's much more modest than the gain obtained from physically independent processors. Since FreeBSDs network stack isn't particularly well threaded, nor is the scheduler optimized for hyperthreading, you get a big mess at the kernel level. Nothing needs to be specially optimized for hyperthreading. All you need is at least two threads available for dispatch, with reasonably heterogenous instruction mixes that can use different parts of the processor hardware at the same time. Real-world instruction mixes are often in this category in general-purpose operating systems. So if you have a nice application that does a lot of threaded math operations, you might think you've achieved something, Heavily math-oriented applications (or any group of applications that contains similar instruction mixes) are among