On Tue, 2003-12-16 at 12:33, Andrew Stevens wrote: > Hi all, > > First off a bit of background to the multi-threading in the current stable > branch. First off: > > - Parallelism is primarily frame-by-frame. This means that the final phases > of the encoding lock on completion of the reference frame (prediction and DCT > transform) and the predecessor (bit allocation). If you have a really fast > CPU that motion estimates and DCT's very fast you will get lower > parallelisation. If you use -R 0 you will get very litte parallelism *at > all*. Certainly not enough to make -M 3 sensible.
Yet again, good to know. This line (generally, a triple loop for 0-3 M, 0-1 I and 0-2 R): Produces this (approximately 1010 frames), encoding times (real time / user time, gives a bit of a view as to how busy the CPUs were during the real time, optimal should be 1m realtime, 2m user time, right? and average system time was 3.0s, with +/- 0.2s for all tests): (options on each call were: -f 8 -g 9 -G 18 -v 0 -E -10 -K kvcd -4 2 -2 1 -F 1 < rawstream.yuv ) -M 0 -I 0 -R 0: 1m 6.082s 0m 50.050s baselines -M 0 -I 0 -R 1: 1m 16.545s 0m 58.980s .. -M 0 -I 0 -R 2: 1m 34.511s 1m 17.045s .. -M 0 -I 1 -R 0: 2m 7.344s 1m 49.495s .. -M 0 -I 1 -R 1: 1m 59.665s 1m 42.215s .. -M 0 -I 1 -R 2: 2m 30.990s 2m 30.990s .. -M 1 -I 0 -R 0: 1m 5.713s 0m 49.800s -0.35s -M 1 -I 0 -R 1: 1m 15.305s 0m 58.975s -1.2s -M 1 -I 0 -R 2: 1m 34.057s 1m 17.090s -0.5s -M 1 -I 1 -R 0: 2m 5.928s 1m 49.700s -1.3s -M 1 -I 1 -R 1: 1m 59.019s 1m 41.955s -0.6s -M 1 -I 1 -R 2: 2m 49.149s 2m 31.440s +19.2s -M 2 -I 0 -R 0: 1m 0.503s 0m 25.930s -5.5s -M 2 -I 0 -R 1: 0m 53.418s 0m 58.950s -23s -M 2 -I 0 -R 2: 1m 7.418s 1m 18.145s -27s -M 2 -I 1 -R 0: 1m 54.534s 1m 50.060s -13s -M 2 -I 1 -R 1: 1m 15.489s 0m 1.040s -- uhm...? -M 2 -I 1 -R 2: 1m 54.720s 1m 16.720s -36s -M 3 -I 0 -R 0: 0m 57.533s 0m 50.610s -8.5s -M 3 -I 0 -R 1: 0m 51.541s 0m 40.265s -25s -M 3 -I 0 -R 2: 1m 5.996s 0m 54.325s -29s -M 3 -I 1 -R 0: 1m 50.570s 1m 49.715s -17s -M 3 -I 1 -R 1: 1m 14.462s 1m 8.530s -45s -M 3 -I 1 -R 2: 1m 36.192 0m 52.145s -54s Interestingly, and I think this has to do with the I/O buffering, -M 0 is slower than -M 1 by a small fraction in all tests. And as Steven Shultz had suggested, -I 1 is a bad bad idea. It never improved performance, and made it in fact quite a bit worse (the man page is right :). (Of course, -M 1 will be at least two processes, and since I have a real dual system, it makes sense, and may not hold true for a single CPU) Also, encoding with one B frame is a touch faster in -I 1 mode than encoding without them, but it is slower when you encode two B frames instead of just one. I find this interesting.. I would have expected a single B frame to take a bit longer than none at all, and that is the case when -I 0 is on, but not when it's -I 1. Any ideas on that one? In the end -M 3 is not reasonably faster in -I 0 -R 0, but flys along at -I 0 -R 2 compared to baseline, and gets fair gains at -I 0 -R 1, while dropping encoding time by another 14 seconds for the same frameset. So, does this boil down to the fastest is -M 3 -I 0 -R 1? The numbers on -M 3 -I 1 -R 2 show a 54 second improvement over the tests with -M 0, but it takes almost 50% longer than -M 3 -I 0 -R 1. The file size of 3-1-2 is 13,807,067 and the file size of 3-0-1 is 13,402,673. The file is smaller, and is encoded faster, and viewing them now, the quality is at least on par (3-0-1 looked a tad better). > - There is also a parallel read-ahead thread but this rarely soaks much CPU on > modern CPUs. > > The MPEG_DEVEL branch encoder stripes all encoding phases to allow much more > scalable parallelisation. You might want to give it a go - I'd be interested > in the results! I'd love to, but I couldn't find it in CVS. I found everything else in the SF CVS branch, but not mjpegtools itself. > N.b. in a 'realistic' scenario you're running the multiplexer and audio > encoding in parallel with the encoder and video filters communicating via > pipes and named FIFO's. This setup usually saturate a modern dual machine No multiplexing and no audio encoding (AC3 pass through and multiplexing of DVD streams is done after completion of the video encoding). There is the overhead of decoding the original MPEG2 stream into YUV, but that's about all else that transcode (which I'm using) is dumping into the pipe. I avoided any of that on this run by just dumping the file in an already decoded format (pgmtoy4m output). > cheers, > > Andrew > PS > I'm away on vacation for a couple of weeks from friday so there'll be a bit of > pause in answering emails / posts from then ;-) ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Mjpeg-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/mjpeg-users