On Jul 2, 6:32 pm, Steve Holden <[EMAIL PROTECTED]> wrote: > Karthik Gurusamy wrote: > > On Jul 2, 3:01 pm, Steve Holden <[EMAIL PROTECTED]> wrote: > >> Karthik Gurusamy wrote: > >>> On Jul 1, 12:38 pm, dlomsak <[EMAIL PROTECTED]> wrote: > >> [...] > > >>> I have found the stop-and-go between two processes on the same machine > >>> leads to very poor throughput. By stop-and-go, I mean the producer and > >>> consumer are constantly getting on and off of the CPU since the pipe > >>> gets full (or empty for consumer). Note that a producer can't run at > >>> its top speed as the scheduler will pull it out since it's output pipe > >>> got filled up. > >> But when both processes are in the memory of the same machine and they > >> communicate through an in-memory buffer, what's to stop them from > >> keeping the CPU fully-loaded (assuming they are themselves compute-bound)? > > > If you are a producer and if your output goes thru' a pipe, when the > > pipe gets full, you can no longer run. Someone must start draining the > > pipe. > > On a single core CPU when only one process can be running, the > > producer must get off the CPU so that the consumer may start the > > draining process. > > Wrong. The process doesn't "get off" the CPU, it remains loaded, and > will become runnable again once the buffer has been depleted by the > other process (which is also already loaded into memory and will become > runnable as soon as a filled buffer becomes available). >
huh? "get off" when talking about scheduling and CPU implies you are not running. It is a common term to imply that you are not running -- doesn't mean it goes away from main memory. Sorry where did you learn your CS concepts? > > > >>> When you increased the underlying buffer, you mitigated a bit this > >>> shuffling. And hence saw a slight increase in performance. > >>> My guess that you can transfer across machines at real high speed, is > >>> because there are no process swapping as producer and consumer run on > >>> different CPUs (machines, actually). > >> As a concept that's attractive, but it's easy to demonstrate that (for > >> example) two machines will get much better throughput using the > >> TCP-based FTP to transfer a large file than they do with the UDP-based > >> TFTP. This is because the latter protocol requires the sending unit to > >> stop and wait for an acknowledgment for each block transferred. With > >> FTP, if you use a large enough TCP sliding window and have enough > >> content, you can saturate a link as ling as its bandwidth isn't greater > >> than your output rate. > > >> This isn't a guess ... > > > What you say about a stop-n-wait protocol versus TCP's sliding window > > is correct. > > But I think it's totally orthogonal to the discussion here. The issue > > I'm talking about is how to keep the end nodes chugging along, if they > > are able to run simultaneously. They can't if they aren't on a multi- > > core CPU or one different machines. > > If you only have one CPU then sure, you can only run one process at a > time. But your understanding of how multiple processes on the same CPU > interact is lacking. > huh? > > > > > >>> Since the two processes are on the same machine, try using a temporary > >>> file for IPC. This is not as efficient as real shared memory -- but it > >>> does avoid the IPC stop-n-go. The producer can generate the multi-mega > >>> byte file at one go and inform the consumer. The file-systems have > >>> gone thru' decades of performance tuning that this job is done really > >>> efficiently. > >> I'm afraid this comes across a bit like superstition. Do you have any > >> evidence this would give superior performance? > > > I did some testing before when I worked on boosting a shell pipeline > > performance and found using file-based IPC was very good. > > (some details > > athttp://kar1107.blogspot.com/2006/09/unix-shell-pipeline-part-2-using.... > > ) > > > Thanks, > > Karthik > > >>>> Thanks for the replies so far, I really appreciate you guys > >>>> considering my situation and helping out. > > If you get better performance by writing files and reading them instead > of using pipes to communicate then something is wrong. > Why don't you provide a better explanation for the observed behavior than to just claim that a given explanation is wrong? I did mention using real shared memory is better. I do know the cost of using a file ("physical disk movements") - but with the amount of buffering that goes on today's file-system implementations, for this problem, we will see big improvement. Karthik > regards > Steve > -- > Steve Holden +1 571 484 6266 +1 800 494 3119 > Holden Web LLC/Ltd http://www.holdenweb.com > Skype: holdenweb http://del.icio.us/steve.holden > --------------- Asciimercial ------------------ > Get on the web: Blog, lens and tag the Internet > Many services currently offer free registration > ----------- Thank You for Reading ------------- -- http://mail.python.org/mailman/listinfo/python-list