Re: Performance of floating point instructions
On Wednesday 10 March 2010, Laurent GUERBY wrote: > On Wed, 2010-03-10 at 21:54 +0200, Siarhei Siamashka wrote: > > I wonder why the compiler does not use real NEON instructions with > > -ffast-math option, it should be quite useful even for scalar code. > > > > something like: > > > > vld1.32 {d0[0]}, [r0] > > vadd.f32 d0, d0, d0 > > vst1.32 {d0[0]}, [r0] > > > > instead of: > > > > flds s0, [r0] > > faddss0, s0, s0 > > fsts s0, [r0] > > > > for: > > > > *float_ptr = *float_ptr + *float_ptr; > > > > At least NEON is pipelined and should be a lot faster on more complex > > code examples where it can actually benefit from pipelining. On x86, SSE2 > > is used quite nicely for floating point math. > > Hi, > > Please open a report on http://gcc.gnu.org/bugzilla with your test > sources and command line, at least GCC developpers will notice there's > interest :). This sounds reasonable :) > GCC comes with some builtins for neon, they're defined in arm_neon.h > see below. This does not sound like a good idea. If the code has to be modified and changed into something nonportable, there are way better options than intrinsics. Regarding the use of NEON instructions via C++ operator overloading. A test program is attached. # gcc -O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -ffast-math -o neon_float neon_float.cpp === ieee754 floats === real0m3.396s user0m3.391s sys 0m0.000s === runfast floats === real0m2.285s user0m2.273s sys 0m0.008s === NEON C++ wrapper === real 0m1.312s user 0m1.313s sys 0m0.000s But the quality of generated code is quite bad. That's also something to be reported to gcc bugzilla :) -- Best regards, Siarhei Siamashka #include #include #if 1 class fast_float { float32x2_t data; public: fast_float(float x) { data = vset_lane_f32(x, data, 0); } fast_float(const fast_float &x) { data = x.data; } fast_float(const float32x2_t &x) { data = x; } operator float () { return vget_lane_f32(data, 0); } friend fast_float operator+(const fast_float &a, const fast_float &b); friend fast_float operator*(const fast_float &a, const fast_float &b); const fast_float &operator+=(fast_float a) { data = vadd_f32(data, a.data); return *this; } }; fast_float operator+(const fast_float &a, const fast_float &b) { return vadd_f32(a.data, b.data); } fast_float operator*(const fast_float &a, const fast_float &b) { return vmul_f32(a.data, b.data); } #else typedef float fast_float; #endif float f(float *a, float *b) { int i; fast_float accumulator = 0; for (i = 0; i < 1024; i += 16) { accumulator += (fast_float)a[i + 0] * (fast_float)b[i + 0]; accumulator += (fast_float)a[i + 1] * (fast_float)b[i + 1]; accumulator += (fast_float)a[i + 2] * (fast_float)b[i + 2]; accumulator += (fast_float)a[i + 3] * (fast_float)b[i + 3]; accumulator += (fast_float)a[i + 4] * (fast_float)b[i + 4]; accumulator += (fast_float)a[i + 5] * (fast_float)b[i + 5]; accumulator += (fast_float)a[i + 6] * (fast_float)b[i + 6]; accumulator += (fast_float)a[i + 7] * (fast_float)b[i + 7]; accumulator += (fast_float)a[i + 8] * (fast_float)b[i + 8]; accumulator += (fast_float)a[i + 9] * (fast_float)b[i + 9]; accumulator += (fast_float)a[i + 10] * (fast_float)b[i + 10]; accumulator += (fast_float)a[i + 11] * (fast_float)b[i + 11]; accumulator += (fast_float)a[i + 12] * (fast_float)b[i + 12]; accumulator += (fast_float)a[i + 13] * (fast_float)b[i + 13]; accumulator += (fast_float)a[i + 14] * (fast_float)b[i + 14]; accumulator += (fast_float)a[i + 15] * (fast_float)b[i + 15]; } return accumulator; } volatile float dummy; float buf1[1024]; float buf2[1024]; int main() { int i; int tmp; __asm__ volatile( "fmrx %[tmp], fpscr\n" "orr%[tmp], %[tmp], #(1 << 24)\n" /* flush-to-zero */ "orr%[tmp], %[tmp], #(1 << 25)\n" /* default NaN */ "bic%[tmp], %[tmp], #((1 << 15) | (1 << 12) | (1 << 11) | (1 << 10) | (1 << 9) | (1 << 8))\n" /* clear exception bits */ "fmxr fpscr, %[tmp]\n" : [tmp] "=r" (tmp) ); for (i = 0; i < 1024; i++) { buf1[i] = buf2[i] = i % 16; } for (i = 0; i < 10; i++) { dummy = f(buf1, buf2); } printf("%f\n", (double)dummy); return 0; } ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Performance of floating point instructions
On Wednesday 10 March 2010, Laurent Desnogues wrote: > Even if fast-math is known to break some rules, it only > breaks C rules IIRC. OTOH, NEON FP has no support > for NaN and other nice things from IEEE754. And just checked gcc man page to verify this stuff. -ffast-math Sets -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range. -ffinite-math-only Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs. This option is not turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. So looks like -ffast-math already assumes no support for NaNs. Even if there are other nice IEEE754 things preventing NEON from being used with -ffast-math, an appropriate new option relaxing this requirement makes sense to be invented. -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Performance of floating point instructions
On Wednesday 10 March 2010, Laurent Desnogues wrote: > On Wed, Mar 10, 2010 at 8:54 PM, Siarhei Siamashka > wrote: > [...] > > > I wonder why the compiler does not use real NEON instructions with > > -ffast-math option, it should be quite useful even for scalar code. > > > > something like: > > > > vld1.32 {d0[0]}, [r0] > > vadd.f32 d0, d0, d0 > > vst1.32 {d0[0]}, [r0] > > > > instead of: > > > > flds s0, [r0] > > fadds s0, s0, s0 > > fsts s0, [r0] > > > > for: > > > > *float_ptr = *float_ptr + *float_ptr; > > > > At least NEON is pipelined and should be a lot faster on more complex > > code examples where it can actually benefit from pipelining. On x86, SSE2 > > is used quite nicely for floating point math. > > Even if fast-math is known to break some rules, it only > breaks C rules IIRC. If that's the case, some other option would be handy. Or even a new custom data type like float_neon (or any other name). Probably it is even possible with C++ and operators overloading. > OTOH, NEON FP has no support > for NaN and other nice things from IEEE754. > > Anyway you're perhaps looking for -mfpu=neon, no? I lost my faith in gcc long ago :) So I'm not really looking for anything. -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Performance of floating point instructions
On Wednesday 10 March 2010, Laurent Desnogues wrote: > On Wed, Mar 10, 2010 at 7:29 PM, Alberto Mardegan > > So, it seems that there's a huge improvements when switching from doubles > > to floats; although I wonder if it's because of the FPU or just because > > the amount of data passed around is smaller. > > On the other hand, the improvements obtained by enabling the fast FPU > > mode is rather small -- but that might be due to the fact that the FPU > > operations are not a major player in this piece of code. > > The "fast" mode only gains 1 or 2 cycles per FP instruction. > The FPU on Cortex-A8 is not pipelined and the fast mode > can't change that :-) It's probably http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344j/ch16s07s01.html vs. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344j/BCGEIHDJ.html I wonder why the compiler does not use real NEON instructions with -ffast-math option, it should be quite useful even for scalar code. something like: vld1.32 {d0[0]}, [r0] vadd.f32 d0, d0, d0 vst1.32 {d0[0]}, [r0] instead of: flds s0, [r0] faddss0, s0, s0 fsts s0, [r0] for: *float_ptr = *float_ptr + *float_ptr; At least NEON is pipelined and should be a lot faster on more complex code examples where it can actually benefit from pipelining. On x86, SSE2 is used quite nicely for floating point math. -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Performance of floating point instructions
On Wednesday 10 March 2010, Alberto Mardegan wrote: > Alberto Mardegan wrote: > > Does one have any figure about how the performance of the FPU is, > > compared to integer operations? > > I added some profiling to the code, and I measured the time spent by a > function which is operating on an array of points (whose coordinates are > integers) and trasforming each of them into a geographic coordinates > (latitude and longitude, floating point) and calculating the distance > from the previous point. > > http://vcs.maemo.org/git?p=maemo-mapper;a=shortlog;h=refs/heads/gps_control > map_path_calculate_distances() is in path.c, > calculate_distance() is in utils.c, > unit2latlon() is a pointer to unit2latlon_google() in tile_source.c > > > The output (application compiled with -O0): Using an optimized build (-O2 or -O3) may sometimes change the overall picture quite dramatically. It makes almost no sense benchmarking -O0 code, because in this case all the local variables are kept in memory and are read/written before/after each operation. It's substantially different from normal code. -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Problem with garage's project being refused
On Mon, Mar 2, 2009 at 4:57 PM, Sarah Newman wrote: > On that note, what do we do if project is there but completely unused > and we want to delete it? I know I'm not the only one ;) Probably first try contact the person who registered the project and ask him(her) to transfer ownership to you. Maybe they don't have time or have other reasons not to work on the project actively and will be glad to know that somebody wants to take it over. -- Best regards Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Screen orientation
On Tuesday 27 January 2009, gary liquid wrote: > the answer to the identification problem is by querying the xrandr x11 > extension library. > > http://www.xfree86.org/current/Xrandr.3.html > > Kamen correctly points out however that a user with the default > installation of maemo does not need to query this interface, there is only > 1 possible default orientation: landscape. > > Future versions of maemo will hopefully have a fully working xrandr > implementation and allow rotation to be queried and controlled in the > default system. > > *note to nokians reading, PLEASE make sure this works and also confirm that > XV rotates correctly as well ;)* > > Gary Birkett (lcuk in #maemo) By the way, have you reported this XV rotation problem to the authors of the unofficial rotation patch? What did they reply? -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Does anyone know the mechanism in Nokia's LCD driver?
On Tuesday 13 January 2009, Frantisek Dufka wrote: > Siarhei Siamashka wrote: > > XV should make a perfect backend for SDL, because it maps fine on SDL > > API (SDL_SetVideoMode/SDL_Flip/...). In general, XV is a good backend > > for anything that uses double-buffered or triple-buffered > > fullscreen/fullwindow blits. It is possible to get ~27.5 frames per > > second in 800x480 resolution for 16bpp rgb color format without > > tearing. With a lower resolution it is possible to go up to ~55 frames > > per second. > > Hmm, I hope there is some work done on this front (Xv or even openGLES > based backend for SDL) for Fremantle. > The alpha release has same old 1.2.8-23 SDL version though. Or is there some > alternative to SDL for 2D graphics? I was only talking about N800/N810 hardware, its kernel, xserver, SDL and how to best use them on the current generation of internet tablets. Fremantle is a completely different subject and I don't feel like discussing it yet. It surely will bring new exciting features and challenges. Regarding improved SDL or whatever is needed for 2D games, in any case it is one of the things that the community can do with or without official Nokia support. > BTW, as for the external lcd controller status - I have just checked and >CONFIG_FB_OMAP_LCDC_EXTERNAL is disabled for both RX-51 > confugurations in Fremantle alpha kernel (unlike e.g. PowerVR stuff) so > maybe we finally has directly mapped framebuffer in SDRAM ? We can discuss this stuff after the actual HW is out :) -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: gprof-base Profiling: "no time accumulated"
On Saturday 24 January 2009, Thomas Thrainer wrote: > Hello, > > I'm trying to profile an application on the N810 with gprof. I compiled it > with -pg, -O3 and -g and linked all the libraries I'm interested in > statically (and compiled them also with -pg -O3 -g). > > The first couple of problems were that I got a floating point exception in > scratchbox when starting the program, and a crash on the tablet. Using > -fno- unit-at-a-time and -fno-omit-frame-pointer solves the problem on the > tablet, it still doesn't run in scratchbox tough. By the way, this is > related to -O3, non-optimized build run fine in both scratchbox and on the > tablet. > > My problem is however, that profiling doesn't give any usable results. In > the profile written to gmon.out there are all times 0.0. But the call graph > and calling count for functions is correct. > > I tried to strace the application, and profiling seems to work normally. > The profiling code sets up the profiling timer, and the SIGPROF signal is > received regularly throughout the program run. > So I suspect the profiling code, or more precisely the SIGPROF-handler, to > not being able to get the currently executing function based on the stack. > My program is not spending a lot of time in some library functions or such, > most of the time it's usually in some user-functions (I know that based on > profiling on my PC). > > Can anybody shed some light on this issue? To I have to link against some > special version of glibc? Or is profiling with gprof broken on ARM's? IIRC, there might be something wrong with the toolchain in the respect of support for -pg option. On the other hand, do you really need to use gprof? Profiling on N810 can be done with oprofile, which covers all the gprof functionality and provides a lot more features. Please check the following page: http://maemo.org/development/tools/doc/diablo/oprofile/ If you think that for your specific case gprof is better and doubt that oprofile can handle it well, please describe what exactly you want to do. I'll try to advice something. -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Does anyone know the mechanism in Nokia's LCD driver?
On Tue, Jan 13, 2009 at 12:23 PM, Felipe Contreras wrote: > On Tue, Jan 13, 2009 at 12:05 PM, Igor Stoppa wrote: >> Hi, >> On Tue, 2009-01-13 at 18:06 +0800, ext Huang Gao (Gmail) wrote: >>> Hi, Igor Stoppa: >>> Thank you for your reply! >>> So can I understand that this hardware FB is not contained in SDRAM >>> or SRAM, and LCD will refresh itself from this hardware FB by its controller >>> automatically, without the help of OMAP DMA channel? >> >> I'm in no way a display guy but iirc there are 2 modes for refresh: >> -auto: whatever is written to the framebuffer goes through straight to >> the LCD >> -manual: the image needs to be flushed to the LCD >> >> If you are interested, you can check it from the kernel source files. > > Would the manual mode help to avoid tearing? Yes, and it does help to avoid tearing. At least this works fine for XV extension. But getting tearfree scrolling/panning in GTK applications for example is a bit more challenging. I can provide a more detailed explanation if anybody is interested. XV should make a perfect backend for SDL, because it maps fine on SDL API (SDL_SetVideoMode/SDL_Flip/...). In general, XV is a good backend for anything that uses double-buffered or triple-buffered fullscreen/fullwindow blits. It is possible to get ~27.5 frames per second in 800x480 resolution for 16bpp rgb color format without tearing. With a lower resolution it is possible to go up to ~55 frames per second. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Does anyone know the mechanism in Nokia's LCD driver?
On Tue, Jan 13, 2009 at 10:56 AM, Frantisek Dufka wrote: > BTW, there is kernel ioctl to set automatic refresh and the refresh rate > can be tweaked in kernel source but the results are suboptimal. Maybe at > least for Nokia 770 it would be possible to use tearsync flag for such > automatic update and set the update timer to lcd refresh rate or lcd > refresh rate/2 to get nice result. N8x0 cannot update whole screen in > one lcd refresh cycle anyway so you'll always get tearing on the bottom. N8x0 can update whole screen in two lcd refresh cycles (but admittedly it just barely crosses this limit), which is enough to have no tearing. But this is only true when running at OP 0 (400MHz). When running at other operating points, RFBI is slower. It may be interesting to experiment with better integration of omapfb kernel driver with DVFS and try to always switch (at least temporarily) to OP 0 when performing screen updates. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Projects Nokia should support (yours?)
On Saturday 01 November 2008, Ryan Abel wrote: > On Nov 1, 2008, at 5:47 AM, Simon Pickering wrote: > > Something with more mileage, though still not really a killer app, > > would be working on optimisation of the backend libs that all three > > media players use (therefore any media player would benefit). But this > > will have to wait until we see what the hardware is capable of really. > > Considering the hardware is already doing 720p with only NEON > optimizations, well. . . . There are always ways to make a powerful hardware run as fast as a snail ;) Especially when excessively complex frameworks and layers are used, chances of having something implemented inefficiently in between get a bit higher. -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: ALSA sound driver for Nokia 770 and DSP programming
On Friday 26 September 2008, Robert Schuster wrote: > Hi, > > Siarhei Siamashka schrieb: > > Recently I have been trying to make it running and seems like we have a > > very good chance to have it working nicely. It is also interesting, that > > the linux-omap guys seem to be developing a new driver [3] for AIC23 > > which may eventually become a better alternative. > > Very nice! > > I will try your patch in mamona which IMO provides the most easy way to > test those things out. That's great, feedback is very much welcome. Though bugreports can wait until I roll out the next revision of the patch ;) > Mamona is based on OpenEmbedded and every other > distribution made from it bases its sound core on ALSA not gstreamer ... Somehow I also like this approach better ... > It is good that a new driver is in development, however I have doubts > that we will be able to run 2.6.27+ on the N770 soon ;). Sure, upgrading the kernel may end up in fixing one problem, but introducing a lot more of them instead :) But we could try to backport the new driver to 2.6.16 once it is ready. > Is the AIC23 also in the N8x0 devices? In addition to the information already provided by Simon, I can recommend to search linux-omap and alsa-devel mailing lists. -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: ALSA sound driver for Nokia 770 and DSP programming
On Friday 26 September 2008, Simon Pickering wrote: > > > Recently I have been trying to make it running and seems > > > like we have a very > > > good chance to have it working nicely. It is also > > > interesting, that the > > > linux-omap guys seem to be developing a new driver [3] for > > > AIC23 which may > > > eventually become a better alternative. > > > Very nice! > > Good stuff Siarhei :) > > Have you built a replacement DSP-kernel yet? Yes, sure. You can easily build DSP kernel and demo_console DSP task from the sources in dspgw-3.3-dsp.tar.bz2 All that is need to done is to run 'make' in 'tokliBIOS' (*), 'tinkernel' and 'apps/demo_mod' subdirectories. You will get 'tinkernel.out' and 'demo_console.o' binary files which are DSP kernel and DSP task respectively. After that, have a look into dspgw-3.3-dynamic-demo-omap1.tar.bz2 archive for the target directory layout and README file with the instructions how to test it. Of course you can replace 'tinkernel.out' and 'demo_console.o' with the files that you have compiled yourself. ARM side binaries also need to be recompiled before you can run them (they were compiled for OABI and will not run out of the box), but that's a minor issue. Only 'dsp_dld' compilation may cause problems because it needs a more up to date version of flex than the one that is part of OS2006 SDK. (*) Actually you need to apply a patch to 'tokliBIOScfg.tcf' if you want to use the generated 'tokliBIOScfg.cmd' instead of tinkernelcfg.cmd'. That was actually the hardest part. Now it is possible to experiment with configuring DSP kernel by changing .tcf file and enabling different kernel features. Documentation which explains its syntax is available in free DSP toolchain. So DSP programming for 770 and other OMAP1 based devices should be perfectly fine. But I can't say the same for N8x0 at the moment, because free DSP toolchain from TI does not support compilation of DSP kernel for OMAP2 according to dspgateway documentation. -- Best regards, Siarhei Siamashka --- dspgw-3.3-dsp.orig/tokliBIOS/tokliBIOScfg.tcf 2005-06-09 07:28:21.0 +0300 +++ dspgw-3.3-dsp/tokliBIOS/tokliBIOScfg.tcf 2008-09-27 03:00:40.0 +0300 @@ -47,6 +47,10 @@ bios.MEM.MALLOCSEG = prog.get("DARAM"); bios.MEM.BIOSOBJSEG = prog.get("DARAM"); +var extmem = bios.MEM.create("EXTMEM"); +extmem.base = 0x14000; +extmem.len = 0x1000; +extmem.createHeap = false; /* * CLK ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: ALSA sound driver for Nokia 770 and DSP programming
On Friday 26 September 2008, Felipe Contreras wrote: > On Fri, Sep 26, 2008 at 12:06 AM, Siarhei Siamashka > > <[EMAIL PROTECTED]> wrote: > > On Thursday 25 September 2008, Felipe Contreras wrote: > >> On Thu, Sep 25, 2008 at 10:07 PM, Siarhei Siamashka > > > > [...] > > > >> > Now regarding why we may want it. Once if we get a good, low latency, > >> > fully functional and reliable ALSA sound driver running on ARM, it > >> > gives maemo community a nice possibility to scrap all the proprietary > >> > DSP binaries. This provides us with a new and shiny 252MHz C55x DSP > >> > core ready to be used by something else :) > >> > > >> > Free linux DSP toolchain from TI [4] supports generation of both DSP > >> > kernel and DSP tasks for OMAP1 based devices which is sufficient for > >> > DSP development. The toolchain license was supposed to permit open > >> > source development (with noncommercial restriction), though the > >> > license text itself is a bit questionable [5]. > >> > > >> > With DSP avalable for use and having no need to spend efforts on > >> > ensuring compatibility and peaceful coexistence with proprietary > >> > binary codecs (free and proprietary code does not mix well), it should > >> > be possible to turn Nokia 770 into quite a powerful media player. > >> > >> Great stuff! > >> > >> Do you plan to use the dsp-gateway or dsp-bridge? > > > > Now as you mentioned that, it indeed makes sense to consider other > > alternatives if they exist. Do you have any links to the information > > about dspgateway vs. dspbridge comparison > > (features/performance/reliability)? > > > > Using dspgateway has a clear advantage that it is already included in the > > kernel. And dspgateway is more or less ok, though patching it a bit in > > order to improve performance will be required. > > Not really, but I've been thinking that a comparison would be useful. > Perhaps some dummy DSP nodes and clients to test them on both would > help. I have one for the dsp-bridge, but not dsp-gateway. The first thing that I did when experimenting with dspgateway was implementation of some simple low level benchmarks to measure communication time between ARM and DSP and data transfer performance. The results were quite interesting. I think that now I know dspgateway problems, its bottlenecks and have some ideas about how to fix them. But that's a topic of another long post and I'll try to share this information later. -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: ALSA sound driver for Nokia 770 and DSP programming
On Thursday 25 September 2008, Felipe Contreras wrote: > On Thu, Sep 25, 2008 at 10:07 PM, Siarhei Siamashka [...] > > Now regarding why we may want it. Once if we get a good, low latency, > > fully functional and reliable ALSA sound driver running on ARM, it gives > > maemo community a nice possibility to scrap all the proprietary DSP > > binaries. This provides us with a new and shiny 252MHz C55x DSP core > > ready to be used by something else :) > > > > Free linux DSP toolchain from TI [4] supports generation of both DSP > > kernel and DSP tasks for OMAP1 based devices which is sufficient for DSP > > development. The toolchain license was supposed to permit open source > > development (with noncommercial restriction), though the license text > > itself is a bit questionable [5]. > > > > With DSP avalable for use and having no need to spend efforts on ensuring > > compatibility and peaceful coexistence with proprietary binary codecs > > (free and proprietary code does not mix well), it should be possible to > > turn Nokia 770 into quite a powerful media player. > > Great stuff! > > Do you plan to use the dsp-gateway or dsp-bridge? Now as you mentioned that, it indeed makes sense to consider other alternatives if they exist. Do you have any links to the information about dspgateway vs. dspbridge comparison (features/performance/reliability)? Using dspgateway has a clear advantage that it is already included in the kernel. And dspgateway is more or less ok, though patching it a bit in order to improve performance will be required. -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
ALSA sound driver for Nokia 770 and DSP programming
Hi, As has been discovered long ago [1] but eventually forgotten, Nokia 770 has AIC23 audio hardware [2] which can be used not only from DSP side, but from ARM as well. Moreover, OS2006 kernel sources even contain an ARM driver for it, but this driver is disabled (that's understandable as the driver is not in a very good shape and has quite a number of bugs). Recently I have been trying to make it running and seems like we have a very good chance to have it working nicely. It is also interesting, that the linux-omap guys seem to be developing a new driver [3] for AIC23 which may eventually become a better alternative. Kernel patch is attached. It enables AIC32 driver, adds a hack to power on/off code so that audio codec is permanently powered on (power on/off code is not reliable and needs to be reworked). Also it fixes a problem with audio stuttering on video playback in mplayer (the driver had broken position reporting which is critical for proper audio/video synchronization). Here is some usage instruction (beware that standard disclaimer applies: you can use this patch at your own risk, this code is quite untested. If it somehow manages to fry your device, you have been warned and I'm not responsible for any breakages): 1. Disable esd daemon and DSP stuff in order to move it out of the way (temporarily rename '/usr/bin/esd' and '/usr/sbin/dsp_dld' to something else) 2. Apply the attached patch to OS2006 kernel, compile and flash it to the device 3. Compile and install alsa userspace library, I used alsa-lib-1.0.11.tar.bz2 4. Put attached 'asound.conf' into '/etc' directory on the device, it enables dmix plugin for audio mixing and resampling 5. Compile and try some applications which use ALSA, I tested 'aplay' and 'mplayer' The driver is semi-usable now, but a lot still needs to be done: * proper power management to avoid excessive battery drain * audio volume control * switch between speaker/headphone * audio quality is a bit crappy now, this needs to be fixed * maybe some more fixes for bugs that are yet to be discovered... DMA code is quite suspicious (especially the way it does channels linking) and might be responsible for audio quality issues. Also sofware mixing/resampling code in dmix plugin can benefit from ARM optimizations. Now regarding why we may want it. Once if we get a good, low latency, fully functional and reliable ALSA sound driver running on ARM, it gives maemo community a nice possibility to scrap all the proprietary DSP binaries. This provides us with a new and shiny 252MHz C55x DSP core ready to be used by something else :) Free linux DSP toolchain from TI [4] supports generation of both DSP kernel and DSP tasks for OMAP1 based devices which is sufficient for DSP development. The toolchain license was supposed to permit open source development (with noncommercial restriction), though the license text itself is a bit questionable [5]. With DSP avalable for use and having no need to spend efforts on ensuring compatibility and peaceful coexistence with proprietary binary codecs (free and proprietary code does not mix well), it should be possible to turn Nokia 770 into quite a powerful media player. 1. http://lists.maemo.org/pipermail/maemo-developers/2006-June/022231.html 2. http://focus.ti.com/docs/prod/folders/print/tlv320aic23b.html 3. http://thread.gmane.org/gmane.linux.ports.arm.omap/11700/focus=11709 4. https://www-a.ti.com/downloads/sds_support/targetcontent/LinuxDspTools/index.html 5. http://www.gossamer-threads.com/lists/maemo/developers/30611 -- Best regards, Siarhei Siamashka diff --git a/arch/arm/mach-omap1/board-nokia770.c b/arch/arm/mach-omap1/board-nokia770.c index 3862a77..90f113a 100644 --- a/arch/arm/mach-omap1/board-nokia770.c +++ b/arch/arm/mach-omap1/board-nokia770.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include extern void nokia770_ts_init(void); extern void nokia770_mmc_init(void); @@ -67,6 +69,42 @@ static int nokia770_keymap[] = { 0 }; +#define DEFAULT_BITPERSAMPLE 16 + +static struct omap_mcbsp_reg_cfg mcbsp_regs = { +.spcr2 = FREE | FRST | GRST | XRST | XINTM(3), +.spcr1 = RINTM(3) | RRST, +.rcr2 = RPHASE | RFRLEN2(OMAP_MCBSP_WORD_8) | +RWDLEN2(OMAP_MCBSP_WORD_16) | RDATDLY(0), +.rcr1 = RFRLEN1(OMAP_MCBSP_WORD_8) | RWDLEN1(OMAP_MCBSP_WORD_16), +.xcr2 = XPHASE | XFRLEN2(OMAP_MCBSP_WORD_8) | +XWDLEN2(OMAP_MCBSP_WORD_16) | XDATDLY(0) | XFIG, +.xcr1 = XFRLEN1(OMAP_MCBSP_WORD_8) | XWDLEN1(OMAP_MCBSP_WORD_16), +.srgr1 = FWID(DEFAULT_BITPERSAMPLE - 1), +.srgr2 = GSYNC | CLKSP | FSGM | FPER(DEFAULT_BITPERSAMPLE * 2 - 1), +/*.pcr0 = FSXM | FSRM | CLKXM | CLKRM | CLKXP | CLKRP,*/ /* mcbsp: master */ +.pcr0 = CLKXP | CLKRP, /* mcbsp: slave */ +}; + +static struct omap_alsa_codec_config alsa_config = { +.name = "Nokia770 AIC23", +.mcbsp_reg
Re: DSP SBC encoder update
On Thu, Jul 10, 2008 at 6:57 PM, Simon Pickering <[EMAIL PROTECTED]> wrote: > No, I understood, I was just mentioning that there appear to be two > heaps to chose from - presumably one is used by the DSP tasks (malloc is > probably #defined as one of the CSL MEM* fns in the DSP Gateway task > functions). Maybe it is just clever/stupid enough to do the allocation automatically. At least when I did some experiments with DSP before, it was alocating DARAM memory. Surely, you might want to have better control to put most performance critical data into DARAM, but malloc is a standard C function and is more portable. >> Yes, accessing SDRAM memory is extremely slow. And if you access SDRAM >> memory using 16-bit accesses instead of 32-bit accesses, the overhead >> doubles. So if your data processing algorithm does not deal > exclusively >> with 32-bit data accesses, you are better not to run it to process > data >> in SDRAM memory. Copying data to a temporary buffer in DARAM or >> SARAM, processing it there and copying results back to SDRAM would be >> faster in this case. > > The X[] array data type is an int32, so even accessing 32bit from SDRAM > is still slower than using a local buffer (depending on what you need to > do with it of course). It depends on how many times the data is accessed. For example, if you have some algorithm that accesses this memory location 10 times, you would have 2 SDRAM + 10 SRAM memory accesses by using fetch/process/store pattern vs. just 10 SDRAM memory accesses if working with this buffer directly in SDRAM. As SDRAM is an order of magnitude slower (decimal order, not binary), you really want to avoid dealing with SDRAM as much as possible. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: DSP SBC encoder update
On Thu, Jul 10, 2008 at 5:09 PM, Simon Pickering <[EMAIL PROTECTED]> wrote: >> > I looked at this yesterday evening (thanks to derf, crashanddie, and > others >> > for answering my C questions), trying to move some parts of the priv >> > structure to SARAM (sorry for the SRAM typo above). Unfortunately just >> > moving the bare minimum (the X array) won't happen as there's not enough >> > SARAM (so dsp_dld tells me). I don't know where it's all gone, anyone > have >> > any ideas? >> >> Do you use any buffers allocated by malloc? My guess is that malloc >> does allocation of DARAM and SARAM memory. >> In any case, memory returned by malloc should be not worse than the >> memory buffer explicitly statically placed to EXTMEM. > > Yes, I think you're right, in the avs_kernelcfg.cmd file it talks about a > DARAM_heap and a SARAM_heap, presumably it's possible to allocate from > either somehow (using the CSL MEM_* calls probably, I don't know off hand > which heap is used for task data, but will have a look this evening). It > also talks about a/the stack being in SARAM. I'm sorry if it was not clear enough. Just use normal malloc from C library without any CSL_MEM_* stuff. You can add some debugging prints for the addresses of allocated blocks and identify what kind of memory they are actually in (DARAM, SARAM, SDRAM). By the way, this information is especially important if you want to use DMA, as you need specifically configure the type of memory (not just address) when setting up DMA transfer. > To answer the question, only if the thing to be malloc'd is small. In this > case it's only a couple of structures (and they are large), so I've manually > created them in EXTMEM2. I know this is not ideal, but they won't fit in > SARAM. > > Over lunch I had a play with the things I talked about in my last email. > Removing the memcpy (from the slow SDRAM X[] array to the fast SARAM > fast_in[] array) made the code marginally slower - at least there were more > drop outs, so it appears that the memcpy() overhead is less than the extra > time needed to access the data in SDRAM. Yes, accessing SDRAM memory is extremely slow. And if you access SDRAM memory using 16-bit accesses instead of 32-bit accesses, the overhead doubles. So if your data processing algorithm does not deal exclusively with 32-bit data accesses, you are better not to run it to process data in SDRAM memory. Copying data to a temporary buffer in DARAM or SARAM, processing it there and copying results back to SDRAM would be faster in this case. > I shaved a few array elements off the output[] SARAM array (down from 100 to > 78 elements, this fits the current Bluez encoder parameters, but if they > were changed upwards, both the input[] and output[] arrays would probably > need to be made bigger). I also removed the #PRAGMAs I had been using to > place the const data from sbc_tables.h in SARAM as from looking the > avs_kernelcfg.cmd file, .const data is already placed in SARAM (SARAM_DATA > section) and I thought this might free up some room to fit the X[] array in > SARAM directly. It didn't. Moving the const tables freed 72x32bits, removing > the fast_in[] array (not needed if X[] itself is fast) freed 80x32bits, but > the X[] array requires 2x160x32bits. It still doesn't fit :( 2x160x32 bits is only 1280 bytes, which is hardly too big. Try to allocate buffers with malloc and copy constant tables there on initialization. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: DSP SBC encoder update
On Thu, Jul 10, 2008 at 2:06 PM, Simon Pickering <[EMAIL PROTECTED]> wrote: >> The change which has allowed it to encode an entire song rather than just > a >> few seconds was to move the input and output buffers from SDRAM (OMAP main >> memory) to SRAM (DSP fast single access memory). There are probably other >> things which would benefit from being moved, the sbc->priv data (or parts >> thereof) for one. This structure is pretty big so I allocated it in SDRAM, >> but at least parts of it might be better off in faster local memory. This > is >> something to look at. > > I looked at this yesterday evening (thanks to derf, crashanddie, and others > for answering my C questions), trying to move some parts of the priv > structure to SARAM (sorry for the SRAM typo above). Unfortunately just > moving the bare minimum (the X array) won't happen as there's not enough > SARAM (so dsp_dld tells me). I don't know where it's all gone, anyone have > any ideas? Do you use any buffers allocated by malloc? My guess is that malloc does allocation of DARAM and SARAM memory. In any case, memory returned by malloc should be not worse than the memory buffer explicitly statically placed to EXTMEM. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: wpa_supplicant and cx3110x
On Sunday 15 June 2008, Andrew Barr wrote: > I am using an N800 running Angstrom[0] booting off of an SD card, using > an initfs modified to multiboot as is documented on the wiki[1]. I am > having some trouble getting wpa_supplicant to authenticate on my WPA-EAP > (PEAP-MSCHAPv2/TKIP) home network. This works fine in the Internet > Tablet OS provided power management is turned off (something is > apparently wrong with 802.11 PSM on my AP). It appears to be off by > default when the tablet is booted into Angstrom > (/sys/devices/platform/wlan-omap/psm is '0'). The EAP auth seems to work > ok (eap_state in 'wpa_cli stat' is SUCCESS) but the crypto gets hung up > at 4WAY_HANDSHAKE and times out. I have observed this on my own home > network and at least one other similar network at my school. I am using > wpa_supplicant 0.6.3 linked with openssl (also tested 0.5.5 linked with > gnutls) and the 'wext' driver. The cx3110x driver is the one shipped in > the initfs for the latest official Nokia firmware. > > I can provide packet captures and similar debugging info, perhaps the > maintainer of the driver can help me out with this? The last message I > saw concerning this was from late 2007, talking about running > wpa_supplicant under ITOS and this was not yet supported as cx3110x did > not yet support WE-18 or later. This appears to no longer be the case, > as /proc/net/wireless indicates WE-22 and no version mismatch messages > are printed by iwconfig. Also, the output of 'iwlist wlan0 scan' has WPA > info, which indicates to me that WPA via Wireless Extensions _should_ > work. > > Thanks for any help. > > [0] http://www.angstrom-distribution.org/ > [1] http://maemo.org/community/wiki/HowTo_EASILY_Boot_From_MMC_card/ IIRC the recommended way to get cx3110x support is to use cx3110x-devel mailing list: https://garage.maemo.org/mailman/listinfo/cx3110x-devel BTW, there is a set of community enhancements and patches for cx3110x driver (Nokia 770 version) collected together by Rodrigo Vivi and posted here: https://garage.maemo.org/pipermail/cx3110x-devel/2008-April/38.html -- Best regards, Siarhei Siamashka ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Hardware performance counters
On 3 March 2008, Gustavo Sverzut Barbieri wrote: > On Sun, Mar 2, 2008 at 2:17 PM, Vinod Hegde <[EMAIL PROTECTED]> wrote: > > Hi Everyone, > > > > How do I access the hardware performance counters available in OMAP2420. > > what are the header files that contain the implementation. > > I tried lots to figure out but in vein. thanks for any help > > Use oprofile, or see how they do it. > > http://blog.gustavobarbieri.com.br/2007/05/22/oprofile-and-maemo-n800/ > > it's outdated, so you need to do your own kernel modules. Up to date instructions are here, it is really easy to install: http://maemo.org/development/tools/doc/oprofile/ ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: WLAN Horrible Roaming Performance (N800, OS2008), Software or Hardware Problem ?
On 22 February 2008, Frantisek Dufka wrote: > Kalle Valo wrote: > >> Also CPU usage is very high because of busyloop when waiting till > >> DMA transfer is done. Tasklet, which executes the code can't be > >> easily preempted, as far as I understand kernel documentation. Maybe > >> it is possible to split tasklet into several parts, one of them > >> could be responsible for initiating DMA transfer, the other could be > >> activated on DMA transfer completion? This all is important for > >> video streaming as any excessive CPU resources consumption by WLAN > >> driver negatively impacts video playback performance. > > > > Sorry, I'm not familiar with OMAP 1710 McBSP, so I can't comment. > > I think you don't have to. From the code (sm_drv_spi_io.c) it looks like > McBSP is setup to use dma transfer with callback when it finishes > > omap_request_dma(OMAP_DMA_MCBSP2_TX, "McBSP TX", dma_tx_callback, ... > omap_request_dma(OMAP_DMA_MCBSP2_RX, "McBSP RX", dma_rx_callback, > > and the dma_tx/rx_callback() code just sets variable > spi_dma.dma__tx/rx_done to 1. > > But the code that sends/receives the frame busyloops for it like this > > omap_start_dma(spi_dma.dma_rx_ch); > > > omap_start_dma(spi_dma.dma_tx_ch); > > > while(!spi_dma.dma_rx_done) { > > > udelay(5); > > > } > > > > > while(!spi_dma.dma_tx_done) { > > > udelay(5); > > > } > > > > > spi_dma.dma_rx_done = 0; > > > spi_dma.dma_tx_done = 0; > > > > > So there is this nice dma architecture with callback used but the code > still spins up the cpu waiting for the done flag instead of sleeping. > > So you need to be familiar with the driver and tell us if it is possible > to sleep inside cx3110x_spi_dma_read and cx3110x_spi_dma_write. And one > also needs to be familiar with kernel programming and waiting primitives > to suggest how to sleep and wait for the callback (if possible in this > context) and how to wake up the sleeping code from the dma callback. A while ago I looked for various kernel docs to see what's happening in the wlan driver and what can be done to reduce cpu load. My impression was that tasklet can be only preempted by hardware interrupts, so it is impossible to sleep in it and give cpu resources to userland applications. If that is true, no matter if n800 driver looks nicer, it must end up busylooping too. Though on Nokia 770 cpu usage is attributed to the application doing (for example wget) and on N800 it is attributed to 'OMAP McSPI/0' process. A solution that I tried to suggest might be to start DMA transfer, schedule another tasklet to run after DMA transfer is done and exit from the first tasklet. That another tasklet should get activated and do the rest of the job. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: WLAN Horrible Roaming Performance (N800, OS2008), Software or Hardware Problem ?
On Feb 14, 2008 8:43 AM, Kalle Valo <[EMAIL PROTECTED]> wrote: [...] > > other users reported it too as Luca Olivetti pointed out. and it > > seems like the problem and fix is described here: > > > > http://internettablettalk.com/forums/showpost.php?p=134914&postcount=15 > > > > at least for the 770 the fix seems to exist, > > What I read from the link, someone had written a workaround to try > again whenever the chip is responding. That would good a feature, but > I would like to get more information about what's happening in this > case. I'm sorry. For some unknown reason, I thought that I notified you about this problem long ago, but appears that we only discussed this issue privately with Frantisek Dufka :( I encountered this problem when I was checking what is the maximum McBSP clock frequency that could be used reliably on Nokia 770 to speed up WLAN performance. To do this stability test, I just put the device on charger, established wlan connection and started a test script which repeatedly executed wget to download a large file, piping it to md5sum and verifying that the file always gets received correctly. That's probably one of the most simple stress tests that can be done :) People on ITT, who are suffering from this disconnection problem are running bittorent client software which probably stresses network to a much higher extent. Having kept this simple test running, I noticed that wlan network is getting stuck eventually. Sometimes very soon and sometimes after a long time. Checking dmesg log revealed the following lines: [84936.145721] We haven't got a READY interrupt from WAKEUP. (firmware crashed?) [84940.419342] TX dropped [84940.419433] TX dropped The symptoms are similar to what other people reported as https://bugs.maemo.org/show_bug.cgi?id=329 Initially I thought that it was the effect of overclocking, but could reproduce the problem even after going back to the standard frequency. With a simple patch that just retries operation on such error, wireless connection got stable. After a long test with the test script, no problems were detected. The following lines could be occasionally seen in dmesg log and it proves that there were potential connection drops encountered, but they all did not cause any troubles in reality (MD5 of downloaded file was always OK): [50559.494232] Dynamic PSM [50559.494323] PSM timeout 1000 ms [50622.038146] We haven't got a READY interrupt from WAKEUP. (firmware crashed?) [50622.038269] Try again... [50622.038330] succeeded!!! I'm attaching the same patch here. It is not very clean, but it does the job (for Nokia 770). And I have encountered other problems with WLAN driver that are yet to be solved. For example, sometimes speed drops to ~30KB/s (that's still an unresolved mystery to me). Also CPU usage is very high because of busyloop when waiting till DMA transfer is done. Tasklet, which executes the code can't be easily preempted, as far as I understand kernel documentation. Maybe it is possible to split tasklet into several parts, one of them could be responsible for initiating DMA transfer, the other could be activated on DMA transfer completion? This all is important for video streaming as any excessive CPU resources consumption by WLAN driver negatively impacts video playback performance. n770_wlan_retry_on_wake_error.diff Description: Binary data ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: SDL, tearing, X overhead and direct framebuffer gfx
On Feb 18, 2008 1:28 PM, Tapani Pälli <[EMAIL PROTECTED]> wrote: > > Could you please verify and confirm this information? Framebuffer > > driver from OS2007 supported tearsync (via OMAPFB_FORMAT_FLAG_TEARSYNC > > flag as Frantisek mentioned), and it was used at least for video. > > Well, I have noticed some tearing in mplayer with OS2008 though. > > > This is what I've heard from our kernel team members, maybe they could > share some more light to this. AFAIK the hardware itself does not offer > sync. Is it possible to invite kernel team members to join this discussion? :) At least it would be nice if they had a look at this thread. N800 hardware definitely supports tearsync. It worked fine in OS2007 (I'm not telling that OS2008 does not support it anymore, I just can't check this till I get home). When I looked through xserver sources last time, tearsync was used for video planes, but was disabled for normal rgb updates. This can be easily explained. Video usually has lower resolution than 800x480, requires less graphics bandwidth and it is possible to display it with perfect tearsync. With tearsync enabled for rgb updates, we get an ugly tearing line at a fixed location in the bottom of screen when doing 800x480 rgb update (the first OS2008 firmware had this problem btw). Without tearsync flag set, we also get tearing, but at random locations on screen, and it is less noticeable/annoying. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: SDL, tearing, X overhead and direct framebuffer gfx
On Feb 18, 2008 9:39 AM, Tapani Pälli <[EMAIL PROTECTED]> wrote: > > When using the supplied SDL library for doing timer-based frame > > rendering, there seems to be > > - heavy tearing > > Tearing unfortunately happens because there is no vsync available for > framebuffer driver to use. Could you please verify and confirm this information? Framebuffer driver from OS2007 supported tearsync (via OMAPFB_FORMAT_FLAG_TEARSYNC flag as Frantisek mentioned), and it was used at least for video. Well, I have noticed some tearing in mplayer with OS2008 though. > > Q: I can't get the tearing away (only fixed at certain line positions). > > What am I doing wrong? > > > > > Nothing, you cannot get away from tearing. Still what about trying different LCD panel timings? For example, reducing LCD refresh rate to something like 40Hz should allow 20 full resolution fullscreen rgb updates per second with perfect tearsync. I don't dare trying such experiments myself as I'm afraid to kill LCD panel of my N800 :) Can any HW expert tell if it can be possible? Link to LCD controller docs is available earlier in this thread. On the other hand, reducing refresh rate may introduce problems for 25 and 30 fps video playback (for high resolution video only, when the time to transfer frame data over graphics bus is larger than one LCD refresh cycle). ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: SDL, tearing, X overhead and direct framebuffer gfx
On Feb 17, 2008 9:56 PM, Tobias Oberstein <[EMAIL PROTECTED]> wrote: [...] > I've read a lot of bits on the web 'bout mplayer, vsync, omapfb etc. > and tried to assemble a minimal example of using direct framebuffer > access for gfx output. > > Q: I can't get the tearing away (only fixed at certain line positions). > What am I doing wrong? Transfer framebuffer->videoram must be fast enough to complete for the period of two LCD refresh cycles, also see http://lists.maemo.org/pipermail//maemo-developers/2007-March/009202.html Using smaller source rectangle in the framebuffer will reduce data transfer time and the tearing line at the bottom will disappear (using 'new' screen update ioctl which was introduced in N800 kernel, this rectangle can be upscaled to fullscreen). You can calculate the resolution which can be used without tearing either theoretically or in an experimental way. > I wondered if there would be any plans to make SDL run directly on > framebuffer .. if not, I'd maybe give it a try. > > Q: Where can I find the sources to the OS2008 SDL? AFAIK, SDL is used pretty much unmodified. My guess is that you can get it here: http://repository.maemo.org/pool/chinook/free/source/ As for some practical solution on N800/N810, I think it makes sense tweaking xserver to add support rgb color format in Xv and tweaking SDL to use Xv for the emulation of setting arbitrary screen resolutions (setting low resolution will eliminate tearing and will be useful for games). For those interested in the topic, documentation for the Epson LCD controller used in N8x0 (S1D13745) is available here: http://vdc.epson.com/index.php?option=com_docman&task=cat_view&gid=38&Itemid=40 ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Frequencies scaling with OS2008
On 31 December 2007, Frantisek Dufka wrote: > Igor Stoppa wrote: > > Having the audio path open, but no dsp tack loaded (arm audio) sets the > > clock to 400MHz. > > Interesting, so, umm, there is way to play audio from ARM side directly? > What I tried is to play BBC radio in home screen applet which activated > only pcm2 task and arm clock dropped from 400 to 330. That lead me to > conclusion that there is no way to output audio with arm clock at > 400Mhz. Why there are pcm tasks (used when streaming internet radio) if > we could output audio from arm side without limiting ARM clock? Siarhei > apparently used a way to output audio without activating DSP from > mplayer, how? I did not do anything special. ARM clock frequency just remains at 400MHz when using esd or sdl for audio output. I did some benchmarks and it became clear that it is now faster not to touch dsp mp3 task and just do all the decoding on ARM core. In addition, my hack which used dspmp3sink from MPlayer, now has problems with audio/video synchronization in OS2008. So looks like it is a good time to drop it. Using DSP for MP3 audio was a useful trick on Nokia 770 and OS2006, but right now everything is reversed for N800/N810 and OS2008. Anyway, my guess also was that pcm dsp task is used in osso-esd, maybe it makes sense to check its sources more thoroughfully. Looks like we will have a lot of new discoveries with OS2008 :) As ARM core is quite fast in N8x0, probably it would make sense to try keeping DSP out of the way whenever possible (restrict it to 133MHz only, keep DSP tasks which are fast enough to run at this frequency, port all the other DSP tasks to ARM)? That is unless improvements to support more intelligent DSP clock frequency selection are still possible. But for those interested in C55x development, Nokia 770 is still a very interesting device as it runs DSP at full speed. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Frequencies scaling with OS2008
On 30 December 2007, Frantisek Dufka wrote: > Krischan Keitsch wrote: > > I was wondering if the device really needs to run at 300MHz (220MHz dsp) > > for mp3 playback? Is the max dsp power needed for such a task? Or would > > 220MHz (177MHz dsp) or 165MHz (85MHz dsp) be sufficient? Would a lower > > dsp scaling save even more battery? > > Well, yes it looks a bit simplistic now. Even if you play audio decoded > by ARM cpu (ogg, real audio) it seems to lock ARM core to 330 and dsp to > 220Mhz. I suppose it is because you need pcm dsp task running for audio > output and any active dsp task locks it to 220Mhz (and thus cpu to 330). > I wonder if it is just simple implementation that can be tuned in next > firmwares or there is some fundamental problem (like changing dsp clock > of already running dsp task may break it so it is hardcoded to 220). ARM cpu frequency is apparently not locked at 330MHz when using pcm dsp task. That's why it is faster to do MP3 decoding on ARM core with the current OS2008 firmware. Extra 70MHz of ARM core frequency are more than enough to handle MP3 and there are even some resources left to speed up video decoding. It is interesting if it is possible to lock ARM/DSP frequency at 400/166 instead of 330/220 when playing video. That would probably improve built-in player performance on some heavy bitrate/resolution videos. Also it is interesting to know what is the difference in power consumption between 400/166 and 330/220 modes? ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Data corruption on N770 in OS2007 HE
On 9 November 2007, Alex DAMIAN wrote: > the modified cx3110x driver works ok. I think it's a bit weird that my > WRT router would assign different IP addresses depending on the driver > loaded, but it's not an inconvenience after all. > > However I run into a bit of trouble trying to modify the initfs. I > took out the initfs from the second HE FIASCO image, and mounted it > with loop/block2mtd and copied all the files off it (including the > /dev/). This pose no problem. I replaced the cx3110x.ko file, and > tried to make the updated jffs2 image with mkfs.jffs2 (I followed your > initfs-flasher script). > > However the jffs2 image I'm building is quite different (quite a bit > smaller), mounting it gives signifiant differences in files compared > to the original one, and dumping it with jffs2dump shows lots of CRC > errors. > > I even rebuild the mtd-utils from CVS, thinking that my Fedora has > some bugged version, but the result is the same. > > Any idea about what I'm doing wrong ? Make sure that you use correct jffs2 eraseblock size (both when mounting image and for mkfs.jffs2), it is 128KiB for Nokia 770. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: DSP vs. ARM endianness and struct packing
On 14 September 2007, Charles 'Buck' Krasic wrote: > I did some experimentation a while back with DSP <-> ARM communication > via mmap'ed memory, in my case I was working on using the DSP for rgb > to yuv conversion. Another big gotcha to look out for is 64k > boundaries. The DSP (at least in the 770) just can't naturally deal > with object bigger than 64k, so you will get very bizarre results if > you run into this limitation. Isn't it more a limitation of a free dsp toolchain? I have seen a pdf where OMAP1710 was mentioned to have c55x rev 3.0 core which does not have this limitation: http://www.ocpip.org/japanese/news/presentations/Japanese_JapanTI.pdf Also when looking for various DSP related information, I found Texas Instruments public ftp with the following interesting directory: ftp://ftp.ti.com/pub/cs/v275/ It looks like a linux c55x dsp toolchain with a slightly updated version, and what is more interesting, it lists OMAP1710 as one of the supported targets. I have also read about a rather scary thing such as silicon bugs :) Looks like silicon bugs are a lot more common in DSP than the bugs in general purpuse cores. My guess is that TI is solving this problem by releasing toolchains which are able to avoid generating problematic sequences of code. In this case having a compiler that is aware of the target core (OMAP1710 and OMAP2420) would be a really nice thing to have. If a more recent toolchain proves to be useful, maybe it would make sense asking TI to include it into a free linux dsp tools package? Or at least query about its status (whether it is ok to download and use that toolchain from ftp or they put it there by some mistake). Hope that this information might be useful for dsp hackers. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Python and GStreamer
On 30 August 2007, Jesse Guardiani wrote: > Koen Kooi wrote: > > -BEGIN PGP SIGNED MESSAGE- > > Hash: SHA1 > > > > Jesse Guardiani schreef: > >>> And the related question is: given an existing program that sends > >>> stuff out to ALSA and doesn't use gstreamer, how difficult is it > >>> generally to port it so that it works properly? > >> > >> No porting necessary, really. mplayer comes with a decoder called > >> libmp3. It's not optimized for ARM or anything, and it compiles without > >> any problems. We don't use it in Kagu for A2DP though because there is > >> another decoder out there called ffmp3 which doesn't use floating point > >> math, so it's a little more efficient on ARM. > > > > use libmad (-ac mad), that works great on arm and x86. > > Any cpu savings over ffmp3? The mplayer maintainers tell me that ffmp3 > is the best choice on ARM. Just for the sake of correctness, I did not quite say that :) This is what I replied to you earlier when we were discussing A2DP performance issues: "Software MP3 (-ac ffmp3) and OGG/Vorbis (don't remember exact '-ac' option for it, but you don't need it anyway as this decoder is used by default) decoders are already enabled in mplayer. If we want the best MP3 decoding performance, libmad (-ac mad) is the best choice, but it is an external dependency and is not used right now. The worst option for software MP3 decoding on ARM is mp3lib (-ac mp3), it uses floating point math and is a lot slower than other (fixed point) MP3 decoders even on N800 which supports floating point math in hardware. You can try to test all these decoders yourself to figure out which one works the best for you." Just ffmp3 is a part of ffmpeg library and is bundled with mplayer by default and libmad is an external dependency which may make packaging a bit more complicated. That's the only reason why libmad support is not enabled in maemo build of mplayer yet. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Java acceleration/Jazelle
On Wednesday 18 July 2007 13:01, Simon Pickering wrote: > Does anyone know whether there are there any good docs/books on ARM asm > programming, telling people these sort of things? This is an interesting > (and hopefully useful) learning experience, but can be really frustrating > when I know what I want to do, and pretty much how to, but not quite! :) > E.g. calling functions in linked libraries, how to call .s file functions > from C, what is and isn't allowed in in-line asm, etc. I would recommend checking the following documentation from ARM website: http://www.arm.com/documentation/Instruction_Set/index.html "ARM v5TE Architecture Reference Manual" for the detailed information about the instruction set (up to ARMv5TE). Unfortunately it does not cover new ARMv6 instructions (I used Quick Reference Card to get some information about them). http://www.arm.com/pdfs/aapcs.pdf "Procedure Call Standard for the ARM Architecture" for the information about calling conventions and arguments passing between asm and C and generally about ABI. http://www.arm.com/documentation/ARMProcessor_Cores/index.html "ARM1136JF-S and ARM1136J-S r1p1 Technical Reference Manual" for ARM11 pipeline description and instruction timings (useful when optimizing for N800). "ARM9EJ-S Revision r1p2 Technical Reference Manual" for ARM9E pipeline description and instruction timings (useful when optimizing for 770). These four pdf files cover almost everything needed if you are interested in assembly programming for Nokia 770 and N800. But surely ARM website provides many other interesting documents worth reading. ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: MPlayer compilation problem (armv5te)
On Friday 25 May 2007 11:42, Juuso Räsänen wrote: > I have been trying to compile MPlayer myself under Maemo 3.1. > I've used patches from: > https://garage.maemo.org/frs/?group_id=54 > > However, commands... > > [sbox-SDK_ARMEL: ~/MPlayer-1.0rc1-maemo.16] > ./configure > [sbox-SDK_ARMEL: ~/MPlayer-1.0rc1-maemo.16] > make > > ...results after a while in errors like: > > cc -c -I. -I../libvo -I.. -Wdeclaration-after-statement -O4 -pipe > -ffast-math -fomit-frame-pointer -I/usr/include -D_REENTRANT > -I/usr/include/gstreamer-0.10 -I/usr/include/glib-2.0 > -I/usr/lib/glib-2.0/include -I/usr/include/libxml2-I/usr/include/SDL > -D_REENTRANT -I/usr/include/freetype2 -DMPG12PLAY -o idct_armv5te.o > idct_armv5te.c idct_armv5te.c: In function `idct_row': > idct_armv5te.c:125: warning: ISO C90 forbids mixed declarations and code > {standard input}: Assembler messages: > {standard input}:107: Error: selected processor does not support `smulbb > fp,lr,r9' {standard input}:109: Error: selected processor does not support > `smlabb r5,r0,ip,fp' This error is caused by the use of armv5te instructions in inline assembly while gcc is not ordered to support them (either with -march or -mcpu options). In order to fix this problem and compile the package with the best settings for N800, you can use: CFLAGS="-mcpu=arm1136jf-s -mfpu=vfp -mfloat-abi=softfp -O3 -fomit-frame-pointer -ffast-math" ./configure make Or just type 'make deb-n800' or 'make deb-n770' to build a .deb package. See 'debian/rules' file for the options used when building mplayer packages. But I think that maemo developers mailing list is not the best place to discuss such application specific issues which do not have any direct relation to maemo platform in general. It would probably make sense to use 'support' tracker at https://garage.maemo.org/projects/mplayer/ for this instead. If you have any other questions or still need help, feel free to e-mail me directly. I guess, discussing the programming techniques and optimizations used in mplayer which are potentially usable in other maemo applications is ok here, but such boring stuff as compilation issues is unlikely to be interesting to anyone else :) ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: Xsp pixel-doubling solutions for Nokia 770?
On Thursday 03 May 2007 12:58, Eero Tamminen wrote: > ext Siarhei Siamashka wrote: > > What problem with using framebuffer directly? Everything should be > > fine, you can get notifications from xserver when your window becomes > > obscured, so you can stop drawing. I suggest you to try MPlayer on > > Nokia 770 to check how it interacts with xserver. The worst thing that > > can happen is some garbage data left on screen on fast applications > > switching. That can happen because there is no support to synchronize > > access to framebuffer in a reliable way (application using framebuffer > > directly may get notification from the xserver about getting inactive > > too late and overwrite some other application window). > > > > Adding support to xserver for proper synchronization with direct > > framebuffer access applications should be quite possible. It already > > synchronizes access to framebuffer with DSP (Xsp API for registering > > DSP area). Almost all the necessary changes will probably have to be > > added at the same places in xserver which support interaction with > > DSP. > > AFAIK Xserver requests & waits DSP to stop updating the framebuffer > before proceeding. This works with HW, but you cannot expect it to > work reliably with misc X clients as there are no guarantees about > what they do. If client is not processing X events, the response would > be waited forever and device freezes. If X server has some timeout for > the client reply, then the server and client can be updating the > framebuffer at the same time and that was what we wanted to avoid > in the first place. Timeout is a perfectly valid solution in my opinion. It just requires that xserver and some thin wrapper library used by misc clients (SDL) both interact correctly. Interface of this wrapper library should be designed in such a way, that it is safe and hard to be misused (special timeout code which automatically terminates the process which refuses to give framebuffer back to xserver). I may provide some extra details about my vision of implementation details if anybody is interested. > > I guess it can't be helped and I will make an example application for > > using framebuffer directly and some kind of tutorial. Don't know when > > I will have enough free time to do this though. > > > > I'm pretty much confident that direct framebuffer access (also with > > pixel doubling support) is quite possible for SDL. I don't care much > > if you believe me or not :) Someone just needs to do the dirty work > > and implement all this stuff. > > Yes, it just cannot be done safely / reliably. I can't be completely sure, but I think it is possible to do safely/reliably. At least it is worth trying in my opinion. The difference in our views is that you see xserver as the only valid Nokia 770 citizen and everything else looks like a very ugly hack to you. I see the problem from the completely different perspective. For many games xserver is irrelevant, they use SDL API and that is what they care about (xserver is just an additional extra layer). Game developers would like to have a fast and reliable SDL implementation which could make efficient use of all the hardware features that can benefit games. If xserver can provide all of this with some standard or nonstandard extensions, that's fine. We only need to estimate the amount of development resources and time needed to do these modifications to xserver and SDL to make use of these features. As games are not a primary target for Internet Tablets, I doubt that anything like this will be officially done any time soon (at least before the Nokia 770 end of life). Am I wrong? In this case tweaking SDL to use framebuffer directly may have a much lower cost. Especially considering that you have already solved this framebuffer sharing problem for DSP video playback. I did not suggest anything completely new ;) It is not quite related, but games also need a reliable and low latency method to play sounds. Current esd daemon solution is not very good for games. Maybe modifying SDL to deal with dsp tasks directly can provide some improvement. Also it would be very nice if SDL_Mixer could use dsp codecs transparently without any extra hacks to play mp3 music. > But for hackers it's enough that it works when it works I guess. :-) I'm not sure if I can consider myself a hacker :) Something that just works is perfectly enough for a prototype. But a production system needs a reliable solution, hence I'm trying to start discussing the implementation details. SDL optimization for Nokia 770 might be an interesting task for some student with lots of free time. In any case, trying alternative solutions is a good thing, it drives the progress, allows us to
Re: N800 & Video playback
On Friday 04 May 2007 10:49, Daniel Stone wrote: > On Thu, May 03, 2007 at 11:10:32PM +0300, ext Siarhei Siamashka wrote: > > Well, found what's the matter and added explanation at bugzilla: > > https://maemo.org/bugzilla/show_bug.cgi?id=1281 > > > > The workaround can be easily added to MPlayer, so that it will > > never call XvShmPutImage with top left image corner at an odd line. > > I'm going to release an updated MPlayer package (maybe even > > a bit later today), it is really fast on N800 with the optimized xserver > > :) > > Aha, that will indeed cause a fallback (x, y, width and height should > all be aligned to 4px). Could you clarify this information? The code from kernel framebuffer driver (blizzard.c) suggests that only width should be 4px aligned: switch (color_mode) { case OMAPFB_COLOR_YUV420: /* Embedded window with different color mode */ bpp = 12; /* X, Y, height must be aligned at 2, width at 4 pixels */ x &= ~1; y &= ~1; height = yspan = height & ~1; width = width & ~3; break; Does xserver introduce additional limitations? ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Nokia 770 & Video playback
On Monday 30 April 2007 14:27, Siarhei Siamashka wrote: > I also tried to use YUV420 on Nokia 770, but it did not work well. > According to Epson, this format should be supported by hardware. Also there > is a constant OMAPFB_COLOR_YUV420 defined in omapfb.h in Nokia 770 kernel > sources. But actually using YUV420 was not very successful. Full screen > update 800x480 in YUV420 seems to deadlock Nokia 770. Playback of centered > 640x480 video in YUV420 format was a bit better, at least I could decipher > what's on the screen. But anyway, it looked like an old broken TV :) Image > was not fixed but floating up and down, there were mirrors, tearings, some > color distortion, etc. After video playback finished, the screen remained > in inconsistent state with a striped garbage displayed on it. Starting > video playback with YUY2 output fixed it. But anyway, looks like YUV420 is > not supported properly in the framebuffer driver from the latest OS2006 > kernel. That's bad, it could provide ~30% improvement in video output > perfrmance for Nokia 770. Maybe upgrading framebuffer driver can fix this > issue (and add tearsync support). By doing a quick kernel framebuffer code review, looks like the problem may reside in the following fragment from hwa742.c: switch (color_mode) { ... case OMAPFB_COLOR_YUV420: bpp = 12; conv = 0x09; transl = 0x25; break; ... } ... set_window_regs(x, y, x + w, y + h); offset = (scr_width * y + x) * bpp / 8; hwa742.int_ctrl->setup_plane(OMAPFB_PLANE_GFX, OMAPFB_CHANNEL_OUT_LCD, offset, scr_width, 0, 0, w, h, color_mode); hwa742.extif->set_bits_per_cycle(16); hwa742.int_ctrl->enable_plane(OMAPFB_PLANE_GFX, 1); hwa742.extif->transfer_area(w, h, request_complete, req); As far as understand it, this code notifies the graphics chip about what screen area it is going to update and starts DMA transfer to fill it with data. But a similar code from 'blizzard.c' also does width correction before 'transfer_area' by doing 'w = (w + 1) * 3 / 4;'. Looks like code from hwa742.c attempts to transfer more data than the graphics chip expects for YUV420 format. This can explain vertical image drift observed in my previous experiments (for 640x480 area starting at 0,80), also excess data may deadlock the graphics chip (for the test with 800x480 area starting at 0,0). Also the 'offset' may be calculated wrong, see [1] for some bits of information about YUV420 framebuffer layout on N800. Starting location should be probably always calculated assuming 16bpp framebuffer layout. Now I wonder who can be considered upstream for this kernel framebuffer driver? I guess reporting a bug at maemo bugzilla is pointless as there are no official updates for Nokia 770 planned. I wonder why this driver is still getting some updates in newer versions of linux omap kernel and what hardware is used to verify that it works? Is it still tested on Nokia 770? 1. http://maemo.org/pipermail/maemo-developers/2007-May/010039.html ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 & Video playback
On Thursday 03 May 2007 10:21, Frantisek Dufka wrote: > Siarhei Siamashka wrote: > > If decoding time for > > each frame will never exceed 28-29ms (which is a tough limitation, cpu > > usage is not uniform), video playback without dropping any frames will be > > possible even with tearsync enabled. > > Would a double or multiple buffering help with this? Yes, most likely it will. N800 has 800x480 virtual size for framebuffer and a new enhanced screen update ioctl. Now it should be possible (did not try yet, but will have some results very soon) to specify output position and size for the rectangle as it gets displayed on the screen. struct omapfb_update_window { __u32 x, y; __u32 width, height; __u32 format; __u32 out_x, out_y; __u32 out_width, out_height; __u32 reserved[8]; }; This theoretically allows us to use some kind of double buffering, we can split framebuffer into two 400x480 parts and while one part is being displayed, another one can be freely filled with the data for the next frame. This will effectively remove the need for OMAPFB_SYNC_GFX, improving peak framerate. But this solution will require support for arbitrary downscaling in YUV420 format for each video frame to fit 400x480 box. The quality will be also reduced a bit, but on the other hand, graphics bus should have no performance problems with sending 400x480 through it. If virtual framebuffer size could be extended to 800x960, this would allow us to use doublebuffering without sacrificing resolution. Anyway, I'll try to fix MPlayer framebuffer output module to properly work with the latest version of N800 firmware and implement this form of doublebuffering. It should provide the fastest video output performance that is possible. Regarding Nokia 770, now it uses 800x600 framebuffer virtual size (some extra waste of RAM?). Anyway, if hwa742 kernel driver could be extended to support this improved screen update API and respect 'out_x' and 'out_y' arguments, we could have four video pages in framebufer memory for 400x240 pixel doubled video output. It could allow to implement a very efficient double buffering for accelerated Nokia 770 SDL project if it ever takes off the ground :) > Does mplayer use different threads for displaying and decoding and decode > frames in advance? No, it doesn't have any extra threads now. But video playback on Nokia 770 is already parallel, splitting tasks between the following pieces of hardware each working simultaneously: 1. ARM core (demuxing and decoding video into framebuffer) 2. DMA + graphics controller (screen update transferring data from framebuffer into videomemory and performaing YUV->RGB conversion on the fly) 3. C55x DSP core (mp3 audio decoding and playback) There is not much point in creating many threads on ARM, as we only have a single ARM core and splitting work into several threads will not accelerate overall performance. Threads could be useful for doing something extra while waiting for other hardware components to finish their work (waiting for screen update for example), but decoding ahead will also require storing the decoded data somewhere. This place for storing decoded ahead frames could be only some extra space in framebuffer memory, otherwise we would lose some performance on moving this data to framebuffer later (and increasing battery power consumption). As framebuffer space is limited, we would not be able to store many frames ahead, and decoding cpu usage most likely varies not between frames but more like between different scenes (complicated action scene will make us run out of decode ahead buffer pretty fast). Anyway, probably this may be worth trying later, there even exists some threads based MPlayer fork: http://mplayerxp.sourceforge.net/ ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 & Video playback
On Thursday 03 May 2007 08:48, Siarhei Siamashka wrote: > The only thing which is unclear here is that Hailstorm does not need to > downscale video in this situation. The bug can be reproduced with 512x288 > video which just needs upscaling to 800x450. Also even standard > Nokia_N800.avi video with proper aspect ratio causes a huge > performance regression and tearing. > > Please give this #1281 issue another look. It looks like a bug in xserver, > but not a hardware limitation. I can probably try to workaround it by > requesting not 512x288 buffer from Xv, but something like 512x308, use > only 512x288 part of it and artificially add black bands above and below. > After that, Xv can be asked to expand it to 800x480 to get expected result > But if it is a bug in xserver, it would be better to get it fixed, > preferably before the next firmware update :) Well, found what's the matter and added explanation at bugzilla: https://maemo.org/bugzilla/show_bug.cgi?id=1281 The workaround can be easily added to MPlayer, so that it will never call XvShmPutImage with top left image corner at an odd line. I'm going to release an updated MPlayer package (maybe even a bit later today), it is really fast on N800 with the optimized xserver :) ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: Xsp pixel-doubling solutions for Nokia 770?
On 5/3/07, Eero Tamminen <[EMAIL PROTECTED]> wrote: [...] Same problem as using framebuffer directly. How user switches to another application? How to invoke power menu properly etc. What problem with using framebuffer directly? Everything should be fine, you can get notifications from xserver when your window becomes obscured, so you can stop drawing. I suggest you to try MPlayer on Nokia 770 to check how it interacts with xserver. The worst thing that can happen is some garbage data left on screen on fast applications switching. That can happen because there is no support to synchronize access to framebuffer in a reliable way (application using framebuffer directly may get notification from the xserver about getting inactive too late and overwrite some other application window). Adding support to xserver for proper synchronization with direct framebuffer access applications should be quite possible. It already synchronizes access to framebuffer with DSP (Xsp API for registering DSP area). Almost all the necessary changes will probably have to be added at the same places in xserver which support interaction with DSP. I guess it can't be helped and I will make an example application for using framebuffer directly and some kind of tutorial. Don't know when I will have enough free time to do this though. I'm pretty much confident that direct framebuffer access (also with pixel doubling support) is quite possible for SDL. I don't care much if you believe me or not :) Someone just needs to do the dirty work and implement all this stuff. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 & Video playback
On Wednesday 02 May 2007 12:39, Daniel Stone wrote: > On Wed, May 02, 2007 at 09:16:01AM +0300, ext Siarhei Siamashka wrote: > > On Tuesday 01 May 2007 20:49, Siarhei Siamashka wrote: > > > Results with unpatched xserver and some more explanations can be found > > > in [3]. > > > Yes, now N800 is faster than Nokia 770 for video output performance at > > > last :) > > > > Well, still not everything is so good until the following bug gets fixed: > > https://maemo.org/bugzilla/show_bug.cgi?id=1281 > > > > The patch for optimized Xv performance will not help to watch widescreen > > video which triggers this tearing bug. If you see tearing on the screen, > > you should know that the YUV420 color format conversion optimization > > patch does not get used at all and xserver most likely uses a slow > > nonoptimized YUV422 fallback code with software scaling. > > Indeed. And the reason the code is there is because Hailstorm can only > downscale at fixed ratios (half and one-quarter), and even then, it > locked up when we tried. Similarly, the display controller's > downscaling didn't work, either. So we can optimise the fallback path, > but you'll still be screwed by sending 16bpp (instead of 12bpp) through > RFBI. The only thing which is unclear here is that Hailstorm does not need to downscale video in this situation. The bug can be reproduced with 512x288 video which just needs upscaling to 800x450. Also even standard Nokia_N800.avi video with proper aspect ratio causes a huge performance regression and tearing. Please give this #1281 issue another look. It looks like a bug in xserver, but not a hardware limitation. I can probably try to workaround it by requesting not 512x288 buffer from Xv, but something like 512x308, use only 512x288 part of it and artificially add black bands above and below. After that, Xv can be asked to expand it to 800x480 to get expected result But if it is a bug in xserver, it would be better to get it fixed, preferably before the next firmware update :) > > Fixing this bug is critical for video playback performance. I hope it > > will be solved in the next version of N800 firmware too. But it we get > > some patch to solve this problem for testing earlir, that would be nice > > too. > > The only patch is optimising that function, really. Even if we did work > out a way to make Hailstorm happy, you can still only scale at those > exact multiples, which doesn't make it a viable general solution. I will do optimized software YV12->YUV420 JIT scaler a bit later (on next weekend?). It will be only a minor modification of YV12->YUV422 scaler which already exists and works fine. If it can be useful for xserver, it might be added there at any time. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 & Video playback
On Wednesday 02 May 2007 12:47, Daniel Stone wrote: > > > X11 error: BadValue (integer parameter out of range for operation) > > > MPlayer interrupted by signal 6 in module: flip_page while gstreamer > > > did play them just fine. Also the Nokia_N800.avi and NokiaN93.avi died > > > in the same way. > > > > This X11 error on video playback start and also sometimes on switching > > fullscreen/windowed mode is a known problem [1] reported in this mailing > > list. > > > > If MPlayer dies on start, usually trying to start it again succeeds. So > > these 320x240 and 352x288 videos could be played as well if you were a > > bit more persistent :) > > Resizing is a bit tricky. Most video hardware lets you use the hardware > to clip, so if you move it beyond the edge of the screen, it just > happily ignores anything beyond the hardware's bounds. Unfortunately > for us, attempting to move a video surface off-screen (even by just a > few pixels) triggers a hardware lockup. > > Given that we can't display the frame at all, we send BadValue (there > are a couple of other conditions where this is possible, but this is the > main one). I don't see the point in returning Success when no video is > drawn at all. So, I guess you could hack mplayer's error handler to > just ignore BadValues from Xv(Shm)PutImage, unless you get more than > five or ten in a row, say. Thanks for the hint, I'll try it. > Bear in mind that, as you've hinted at, the only part of the Xv code > which is custom is the _output_ code. We're using the standard X server > implementation (as used by tens of millions of people) for the protocol > decode and standard semantics, the standard KDrive layer for extended > stuff (as used by god-knows-how-many embedded and consumer devices), and > then the only part we have to play is taking frames and putting them on > the screen. > > Due to some restrictions (as above), we have to deliberately error out > on some operations. But errors like that tend to say 'you've hit a > hardware restriction, I can't do this', rather than 'you hit one of the > many random return BadValues we put in this weird code just to confuse > people'. That's the interesting information, thanks. > Also, bear in mind that a lot of the initial instability was due to the > DSP. The video was actually rather stable when you played without > sound, although now the situation is somewhat reversed with the DSP > being pretty steady now, and the new YUV420 code having complicated > semsnatics. Well, I was planning to raise this issuer later (after xserver/Xv things are clear), but looks like DSP still has some problems on N800. In MPlayer it can be triggered by a number of very fast sequential gstreamer pipeline start/stop operations which usually happen on seeking. Audio playback just hangs. Right now MPlayer artificially introduces 100ms pause to workaround this problem. I tried to reproduce the same issue on a small test program, but did not succeed yet. > > I have also submitted this patch to maemo bugzilla, hopefully it (or its > > modification) can get included into the next version of N800 firmware: > > https://maemo.org/bugzilla/show_bug.cgi?id=1278 > > I'll merge it with some changes. Thanks a lot. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 & Video playback
On Wednesday 02 May 2007 12:54, Daniel Stone wrote: > > The 'framebuffer' is just the ordinary system memory, converting color > > format and copying data to framebuffer will be done with the same > > performance as simulated in this test. RFBI performance is only critical > > for asynchronous DMA data transfer to LCD controller which does not > > introduce any overhead and is performed at the same time as ARM core is > > doing some other work (decoding the next frame). RFBI performance matters > > only if data transfer to LCD is still not complete at the time when the > > next frame is already decoded and is ready to be displayed. When playing > > video, ARM core and LCD controller are almost always working at the same > > time performing different tasks in parallel. I think I had already > > explained these details in [1] > > Right. My point is that the numbers you're showing -- while very good, > don't get me wrong -- won't necessarily have a huge direct impact on > video playback. Particularly if you want to avoid tearing. I have no idea what other proof would be enough for you. You already got all the numbers, and even benchmarks with patched xserver. They all confirm video output performance improvement. > > So now the results of the tests are consistent - when doing video output, > > most of ARM core cycles are spent in this 'omapCopyPlanarDataYUV420' > > function. > > Well, either that, or just waiting for RFBI transfers to complete. You need to wait a bit before displaying the next frame anyway, and the period between frames for 30 fps video usually eclipses transfer completion time. If you want some numbers, now 640x480 YUV420 (12bpp) screen update takes now 25ms without tearsync flag enabled (OMAPFB_FORMAT_FLAG_TEARSYNC for OMAPFB_UPDATE_WINDOW ioctl) and 25-42ms with tearsync. For 30 fps video, period between performing screen updates is normally 33ms. For playing video, we initiate RFBI transfer, wait till it completes, perform VY12->YUV420 color format conversion (which should take less than 4ms for 640x480 considering benmchmark results), wait till it is time to display the next frame and start RFBI transfer again. For 30 fps video 25ms+4ms is less than 33ms, so without tearsync enabled, any 640x480 video should play fine (considering video output performance). With tearsync enabled, we should add the time needed for performing vertical sync in LCD controller which breaks our nice numbers. Worst case (17ms wait for retrace + 25ms for actual data transfer) takes more time than 33ms between frames. We can be saved if LCD controller internal refresh rate is really 60Hz, it this case video playback will automagically synchronize to LCD refresh rate and each frame processing will be done exactly within 2 LCD refresh cycles (by the time we want to display a video frame, the next vertical will be near and we will not lose much time waiting for it). If decoding time for each frame will never exceed 28-29ms (which is a tough limitation, cpu usage is not uniform), video playback without dropping any frames will be possible even with tearsync enabled. That's what I'm investigating now. In any case, getting ideal 24 fps playback will be a bit easier. I hope all these explanations are clear now. And this is not just a theory, but already confirmed by some experiments and practical tests. > I'm still using Scratchbox 0.9.8.5 for day-to-day stuff ... Thanks, that is what I would consider 'additional tips and tricks' :) It is good to know that maemo 3.x development can be also done with older scratchbox (I have 0.9.8.8 installed now), I'll try it without upgrading scratchbox then. > > Well, anyway, everything worked perfectly and I could play 640x480 video > > on N800 with the following statistics: > > > > VIDEO: [DIVX] 640x480 12bpp 23.976 fps 886.7 kbps (108.2 kbyte/s) > > ... > > BENCHMARKs: VC: 87,757s VO: 8,712s A: 1,314s Sys: 3,835s = > > 101,618s BENCHMARK%: VC: 86,3592% VO: 8,5736% A: 1,2932% Sys: 3,7740% > > = 100,% BENCHMARKn: disp: 2044 (20,11 fps) drop: 355 (14%) total: > > 2399 (23,61 fps) > > > > As you see, mplayer took 8.712 seconds to display 2044 VGA resolution > > frames. If we do the necessary calculations, that's 72 millions pixels > > per second, quite close to 'yv12_to_yuv420_line_armv6' capabilities > > limit, so this function is the only major contributor to video output > > time. Video output took much less time than decoding, so it proves that > > video output overhead can be reduced to minimum (in this test tearsync > > was not used though). > > I'd be curious to see the results from this with tearsync _enabled_? > i.e., after your OMAPFB_UPDATE_WIDNOW call, issue an OMAPFB_SYNC_GFX > ioctl before you start writing to memory again. This is basically the > limiter for us at this stage. That's exactly how MPlayer works. It always waits on OMAPFB_SYNC_GFX before filling framebuffer with the data for the next frame. Not issuing OMAPFB_SYNC
Re: Xsp pixel-doubling solutions for Nokia 770?
On Wednesday 02 May 2007 23:01, Arnim Sauerbier wrote: > If the memcpy on 770 is something like 190MB/s, pushing 800x480 at 30fps > would use only 12 percent of that bandwidth I'm sorry, I was the source of this misleading information, I forgot that you are a Nokia 770 user and mentioned some numbers from N800. I measured the memory bandwidth as ~170-190MB/s for memcpy and ~410MB/s for memset on N800. The same numbers on Nokia 770 are ~70-100MB/s for memcpy (depending on relative source and destination buffers alignment) and ~120MB/s for memset with the standard glibc functions. These glibc functions have ARM assembly optimizations developed by Nicolas Pitre from MontaVista Software, Inc., but according to comments found inside, they were developed for XScale cpus. Such code is not so good for Nokia 770 and can be replaced with something better. Using some arm926ej-s specific optimizations, it is possible to get ~100-110MB/s for memcpy and ~270MB/s for memset on Nokia 770. More details and a link to the necessary code can be found here: https://maemo.org/bugzilla/show_bug.cgi?id=733 Maybe it is time to try getting these optimized functions integrated into glibc for use on Nokia 770? Surely they need to be tested a bit more first. But improving core system components (glibc, xserver, SDL, ..) may help to make Nokia 770 at least a bit faster and more competitive. Any comments are surely welcome. I wonder if it would be possible to get a community improved firmware for Nokia 770 created (with bootmenu included, improvements to the kernel, gstreamer ogg vorbis support out of the box, some performance optimizations and bugfixes) and become available for download somewhere? Because of proprietary parts, probably this firmware should be hosted by Nokia in the standard place where the user needs to enter serial number to download it? Of course it would be unofficial and unsupported just like the hacker's edition. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: Toolchain upgrade? (Was: Instructions cache flush on ARM)
On Wednesday 02 May 2007 18:48, Eero Tamminen wrote: > On x86 I prefer valgrind/cachegrind/callgrind/kcachegrind as > that way one can browse the source code interactively with > the profiling information. Getting to know how the source > really works is sometimes more useful than knowing the exact > bottleneckedness percentage of some function. Sure, I'm also using valgrind/cachegrind/callgrind/kcachegrind in my work quite often. It's a very nice tool. But callgrind for statistics does not provide information about floating point math and integer divisions, so real results on ARM may be really very different. Also cache behaviour on Nokia 770 arm926ej-s core is very different from cache on x86. Actually arm926ej-s does not allocate cache line on write miss and all the x86 cpus do. This makes very big difference for the code which does lots of writes to uncached memory. Cachegrind only simulates write-allocate cache. I created the following patch for simulating read-allocate behaviour in callgrind (for more precise arm926ej-s simulation): http://ufo2000.sourceforge.net/files/vg-read-allocate-cache-patch.diff Though arm1136jf-s core from N800 now supports write-allocate cache and this patch is not needed when optimizing for N800 :) > > Did anybody try installing newer toolchains in scratchbox and use them > > with maemo SDK? I just don't have much free time for these experiments > > and don't want to break my installation of scratchbox which works now > > (more or less acceptable) > > Installing new toolchains for Sbox shouldn't be a problem (if it's > already available for it) and you can make a new Sbox target for each > toolchain you want to test. Thanks, I'll try that. In my preliminary tests, mplayer becomes a few percents faster for mpeg4 decoding when switching to gcc 4.1.1 (tested a build compiled with a crosscompiler outside scratchbox, with no audio/video output except for SDL, so not really useful for end users, but fine for benchmarking with gprof). > > Building packages with new toolchain would probably need to have > > libstdc++ linked statically for C++ applications to work on 770/N800, but > > otherwise everything should be fine. > > Actually, you cannot really build static binaries with Glibc. > It links some stuff always dynamically (nss for example). > I don't know whether this is a problem in practice though. I'm not going to statically link with glibc, but only with libstdc++ (standard c++ library). There are a few known tricks to make gcc link with libstdc++ statically, but dynamically with all the rest of libraries. One of them is creating a symlink to libstdc++.a in some empty directory and specify this directory with -L option in gcc command line. When gcc will start linking, it will be fooled to link with a static libstdc++ library. But I guess just killing libstdc++.so in scratchbox will do the the job. After that, the compiler theoretically should create binaries which should run with no problems on the device even for c++ applications. > > http://arm.com/documentation/ARMProcessor_Cores/index.html > > 'ARM1136JF-S and ARM1136J-S r1p1 Technical Reference Manual' > > Chapter 4 'Unaligned and Mixed-Endian Data Access Support' > > Did you read the section on "ARMv6 unaligned data access restrictions"? > Basically it doesn't work in all cases, the accesses are not atomic and > have performance implications. Did you also read Intel docs? Unaligned access has some restrictions on x86 as well. Do you have an example of some practical case where hardware unaligned support from ARM11 would work worse than on x86? The compiler should do the job aligning data for performance reasons (as it does on x86 as well). But if you happen to have some unaligned data in memory anyway, just reading it with some minor unavoidable performance penalty will be faster than reading data one byte at a time and combining it into a 32-bit or 16-bit value (instructions timings can be also found in this Technical Reference Manual). Enabling hardware unaligned access support should make explicit pointer conversion hacks that are sometimes used in not very portable C code work just like they do on x86. Which is a good thing in my opinion. > > As ARM11 core used in N800 is little endian, does have floating point > > unit and supports unaligned memory access in hardware (which only needs > > to be enabled). It probably doesn't have any serious portability issues > > to be aware of anymore and vast majority of software initially developed > > for x86 should be easy to compile and run on it even without doing any > > modifications. > > Compiler aligns everything correctly if your code is correct. > I think non-aligned code is bug and performance issue. In the real world such buggy code unfortunately exists. And it works fine on x86 which is probably the most widely used platform for software development. > > Enabling unaligned memory support will make life much easier for > > developers unfami
Re: Xsp pixel-doubling solutions for Nokia 770?
On Wednesday 02 May 2007 20:40, Daniel Stone wrote: > > For the use case which is being described here - namely always full > > screen applications which need exclusive access to the display at a > > lower resolution Why not do something like switch to another VT and do > > it directly on the framebuffer ? and then wrap this with something > > that makes sure you can always safely return to/from X - maybe > > something managed through systemUI or some such. This is a different > > approach but could prove simpler in the long run though I havn't > > thought long and hard about it so there could be some obvious > > downsides - More a random idea :) > > Egh, my eyes! Dealing with input in particular could be a pain. This is what works for MPlayer on Nokia 770. It creates x11 window just to reserve some screen space and prevent other applications from using it. After that, it renders data directly to framebuffer and uses x11 for input. It is not very clean, but it works. And it works fast. The same trick can be probably done for SDL. Here is a link to the old discussion in the mailing list with the initial idea: http://maemo.org/pipermail/maemo-developers/2006-December/006646.html ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: Xsp pixel-doubling solutions for Nokia 770?
On Monday 30 April 2007 12:26, Frantisek Dufka wrote: > Daniel Stone wrote: > > Specifying that pixels must be exactly _doubled_ is a > > hack around both the performance issues and a lack of resolution > > independence. Apparently an important one, if you happen to like SDL > > games, but a hack nonetheless. > > Yes limiting ourselves to doubling is bad. Why not to add custom ratio > if N800 can do that. > > This all leads to request to have some more advanced gaming API. Sadly > this is probably not what internet tablets are currently designed for. > Gamers are big target group and this device is meant for entertainment > so maybe extending target audience to gamers in not that bad idea. > Gaming devices are moving online too so this is direct competition. Why > to buy internet tablet if better Sony or Nintendo device in future will > do this too plus gaming. Unfortunately gaming business has complicated > rules similar in complexity to devices with GSM radio. BTW are internet > tablets in same Nokia multimedia division as N-Gage? Well, SDL is to some extent this advanced gaming API, its current implementation for Nokia 770/N800 is just poor. As for pixel doubling, a practical solution would be just to support 400x240 fullscreen resolution in SDL so that no extra hack would be required when porting each game or emulator in particular. N800 hardware probably makes it possible to set any resolution up to 800x480, with all this available using standard SDL API. Having support for both pixel doubled and normal graphics in the same game may be useful, but it will require extra efforts when porting games, while low resolution may already work out of the box without doing any tweaks to the sources. Let's try the simple solution first. The very first step would be to take Nokia 770 xserver and SDL sources and tweak them until setting 400x240 fullscreen resolution works transparently for any SDL applications. Anybody up to this task? Also it would be a good idea to benchmark SDL, identify maemo or ARM architecture related bottlenecks and try to fix them. Many older generation games worked perfectly on hardware way slower than Nokia 770. So Nokia 770 may be a good platform for mobile gaming if properly optimized (though I'm not sure about realtime games because of unsuitable controls). I could probably do these optimizations myself, but have quite a limited amount of free time available for free software development. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 & Video playback
On Tuesday 01 May 2007 20:49, Siarhei Siamashka wrote: Looks like I have to reply to myself. > On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote: > > Applied and build without problems for me. > > Thanks a lot for building the package and putting it for download, > everything seems to be fine, but more details will follow below. [snip] > Anyway, the new xserver package works really good. If we do some tests with > the standard Nokia_N800.avi video clip, we get the following results with > the patched xserver: > > # mplayer -benchmark -quiet -noaspect Nokia_N800.avi > BENCHMARKs: VC: 29,764s VO: 7,666s A: 0,468s Sys: 64,635s = 102,534s > BENCHMARK%: VC: 29,0287% VO: 7,4767% A: 0,4565% Sys: 63,0381% = 100,% > BENCHMARKn: disp: 2504 (24,42 fps) drop: 0 (0%) total: 2504 (24,42 fps) > > # mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi > BENCHMARKs: VC: 30,266s VO: 5,490s A: 0,467s Sys: 66,286s = 102,509s > BENCHMARK%: VC: 29,5255% VO: 5,3554% A: 0,4560% Sys: 64,6631% = 100,% > BENCHMARKn: disp: 2501 (24,40 fps) drop: 0 (0%) total: 2501 (24,40 fps) > > Results with unpatched xserver and some more explanations can be found in > [3]. > Yes, now N800 is faster than Nokia 770 for video output performance at > last :) Well, still not everything is so good until the following bug gets fixed: https://maemo.org/bugzilla/show_bug.cgi?id=1281 The patch for optimized Xv performance will not help to watch widescreen video which triggers this tearing bug. If you see tearing on the screen, you should know that the YUV420 color format conversion optimization patch does not get used at all and xserver most likely uses a slow nonoptimized YUV422 fallback code with software scaling. Fixing this bug is critical for video playback performance. I hope it will be solved in the next version of N800 firmware too. But it we get some patch to solve this problem for testing earlir, that would be nice too. > Video output overhead on N800 is really at least halved. Of course, video > output takes only some fraction of time in video player. So overall > performance improvement for Nokia_N800.avi playback is approximately 20% > but not 250%-300% which can be observed for 'omapCopyPlanarDataYUV420' > function alone. Before anybody noticed, correcting myself :) This 'omapCopyPlanarDataYUV420' has 2.5x-3x improvement which is equal to 150%-200% in percents. Elementary arithmetics is tough when you are tired ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 & Video playback
On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote: > > OK, here is this untested a patch for xserver to add ARMv6 optimized > > YUV420 color format conversion. Theoretically it should compile > > (I did not try to build xserver myself though) and work. If it refuses to > > compile, fixing the patch should be not too difficult. > > Applied and build without problems for me. Thanks a lot for building the package and putting it for download, everything seems to be fine, but more details will follow below. > For testing, I fabricated some video with gstreamer: > > which resulted in [EMAIL PROTECTED] and [EMAIL PROTECTED] videos. For some > reason 320x240 and 352x288 refused to play with: > > X11 error: BadValue (integer parameter out of range for operation) > MPlayer interrupted by signal 6 in module: flip_page while gstreamer did > play them just fine. Also the Nokia_N800.avi and NokiaN93.avi died in the > same way. This X11 error on video playback start and also sometimes on switching fullscreen/windowed mode is a known problem [1] reported in this mailing list. If MPlayer dies on start, usually trying to start it again succeeds. So these 320x240 and 352x288 videos could be played as well if you were a bit more persistent :) As Daniel replied in one of the followup messages, it is most likely some race condition. The question is which code is a suspect. Is it MPlayer Xv video output code that has been around for ages and worked fine on different systems or relatively new Xv extension code from N800 xserver? In addition, a previous revision of N800 firmware had a serious bug [2] related to video playback. It should be noted, that MPlayer needed only about 1 minute to freeze on the initial N800 firmware. So the problem could be identified much more easily if MPlayer was included in the standard set of tests done by Nokia QA staff before each new IT OS release. Surely, Nokia is only interested in a properly working xvimagesink for the software included in IT OS by default. But testing with more client applications can improve overall xserver quality. With all that said, I don't know if MPlayer Xv code is bugfree, it wasn't me who developed it. > My mplayer is compiled from the svn > trunk of the garage project, with some additional cflags I use (so > maybe those were the problem...). Do you have a set of cflags settings which work better than the default set? Can you share this information? > There's something fishy in the decoding or something as the color bars > in the test video were broken (yellow and cyan to be precise), but > that seemed to be the case in a "vanilla" image too so nothing to do > with this patch. I could not see any other glitches in the output. > > But on to the results: > > VIDEO: [DX50] 640x480 24bpp 30.000 fps 1597.6 kbps (195.0 kbyte/s) [snip] > VIDEO: [DX50] 800x480 24bpp 30.000 fps 1976.5 kbps (241.3 kbyte/s) [snip] > There is a clear drop in amount of time used to output the videos for > 800x480 (the numbers were stable trough multiple runs). > > So I gather from the >10s benchmark time that we didn't get to real > time yet, but close to it? And of course this is just video, audio > decoding should be considered for real video playback performance > measurement. These videos are way too heavy for N800 to decode and play in realtime. We may expect playback for videos up to 640x480 resolution with <1000kbps bitrate and 24fps. This is probably current realistic limit which can be achieved. Some minor variations to these parameters are possible (for example we can get 30fps, but should also reduce resolution or bitrate, etc.). If you want a guaranteed video playback with divx/xvid/mpeg4 codecs, you should restrict to 512x384 resolution or lower and keep bitrate reasonable. The results for these 'insane' videos you have posted are somewhat weird, a complete statistics would require also a number of frames dropped, otherwise we don't know how much work was done by the player. Probably missing audio track resulted in MPlayer not being able to provide a proper report. Don't know. Also it is strange that you did not see any improvement at all for 640x480 video, are you sure you really tested it with the patched xserver? Anyway, the new xserver package works really good. If we do some tests with the standard Nokia_N800.avi video clip, we get the following results with the patched xserver: # mplayer -benchmark -quiet -noaspect Nokia_N800.avi BENCHMARKs: VC: 29,764s VO: 7,666s A: 0,468s Sys: 64,635s = 102,534s BENCHMARK%: VC: 29,0287% VO: 7,4767% A: 0,4565% Sys: 63,0381% = 100,% BENCHMARKn: disp: 2504 (24,42 fps) drop: 0 (0%) total: 2504 (24,42 fps) # mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi BENCHMARKs: VC: 30,266s VO: 5,490s A: 0,467s Sys: 66,286s = 102,509s BENCHMARK%: VC: 29,5255% VO: 5,3554% A: 0,4560% Sys: 64,6631% = 100,% BENCHMARKn: disp: 2501 (24,40 fps) drop: 0 (0%) total: 2501 (24,40 fps) Results with
Re: N800 & Video playback
On Tuesday 01 May 2007 13:36, Kalle Vahlman wrote: > 2007/5/1, Siarhei Siamashka <[EMAIL PROTECTED]>: > > OK, thanks. It may take some time though. I'm still using old scratchbox > > with mistral SDK here (did not have enough free time to upgrade yet). > > Until I clean up my scratchbox mess, I can only provide some patch > > without testing, if anybody courageous can try to build it :) > > Given that I fear not the perils of building a X server with > nonstandard options[1], I shall be more than happy to conduct such > adventurous acts :) > > And unless Mr. Kulve has objections, the results could be installed > from a repository as well. > > [1] > http://syslog.movial.fi/archives/47-Shadows-for-everyone-well,-not-really.html OK, here is this untested a patch for xserver to add ARMv6 optimized YUV420 color format conversion. Theoretically it should compile (I did not try to build xserver myself though) and work. If it refuses to compile, fixing the patch should be not too difficult. In the worst case only video playback may be broked. But if everything works as expected, video output performance should become a lot better. Video output performance can be tested by mplayer using -benchmark option, 'VO:' stat shows how much time was used for video output, 'VC:' stat shows how much time was used for video decoding. Built-in video player also should become faster. I don't know if this improvement can be 'scientifically' benchmarked, but it should drop less frames on high resolution video playback. If any of you can build xserver package with this patch, please put it for download somewhere or send directly to me. Thanks. diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am --- xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am 2007-03-05 16:17:32.0 +0200 +++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am 2007-05-01 15:04:43.0 +0300 @@ -1,5 +1,5 @@ if XV -XV_SRCS = omap_video.c +XV_SRCS = omap_video.c omap_colorconv.S omap_colorconv.h endif if DEBUG @@ -34,4 +34,4 @@ $(TSLIB_FLAG) \ $(DYNSYMS) -EXTRA_DIST = omap_video.c +EXTRA_DIST = omap_video.c omap_colorconv.S omap_colorconv.h diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h --- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h 1970-01-01 03:00:00.0 +0300 +++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h 2007-05-01 15:06:13.0 +0300 @@ -0,0 +1,45 @@ +/* + * Copyright © 2007 Siarhei Siamashka + * + * Permission to use, copy, modify, distribute and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the names of the authors and/or copyright holders + * not be used in advertising or publicity pertaining to distribution of the + * software without specific, written prior permission. The authors and + * copyright holders make no representations about the suitability of this + * software for any purpose. It is provided "as is" without any express + * or implied warranty. + * + * THE AUTHORS AND COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO + * THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR + * ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER + * RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF + * CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN + * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Author: Siarhei Siamashka <[EMAIL PROTECTED]> + */ + +/* + * ARMv6 assembly optimized color format conversion functions + * (planar YV12 to some custom YUV420 format used by graphics chip in Nokia N800) + */ + +#ifndef _OMAP_COLORCONV_H_ +#define _OMAP_COLORCONV_H_ + +#include + +/** + * Convert a line of pixels from YV12 to YUV420 color format + * @param dst - destination buffer for YUV420 pixel data, it should be at least 16-bit aligned + * @param src_y - pointer to Y plane, it should be 16-bit aligned + * @param src_c - pointer to chroma plane (U for even lines, V for odd lines) + * @param w - number of pixels to convert (should be multiple of 4) + */ +void yv12_to_yuv420_line_armv6(uint16_t *dst, const uint16_t *src_y, const uint8_t *src_c, int w); + +#endif diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S --- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S 1970-01-01 03:00:00.0 +0300 +++ xorg-server-1.1.99.
Re: N800 & Video playback
On Monday 30 April 2007 17:49, Daniel Stone wrote: > > ARMv6 optimized YV12->YUV420 convertor is about 2.5x faster > > than current code used in N800 xserver. So it should provide a nice > > improvement for video :) > > Indeed. Unfortunately this is slightly misleading in that it only shows > the raw write speed. RFBI can't deal with the sorts of speeds that your > hyper-optimised version is pumping out, e.g. So it's mainly just about > cutting the latency into the critical path to low enough that it makes > no difference. The 'framebuffer' is just the ordinary system memory, converting color format and copying data to framebuffer will be done with the same performance as simulated in this test. RFBI performance is only critical for asynchronous DMA data transfer to LCD controller which does not introduce any overhead and is performed at the same time as ARM core is doing some other work (decoding the next frame). RFBI performance matters only if data transfer to LCD is still not complete at the time when the next frame is already decoded and is ready to be displayed. When playing video, ARM core and LCD controller are almost always working at the same time performing different tasks in parallel. I think I had already explained these details in [1] Well, as xomap server is probably compiled for thumb, tried to compile this test program for thumb instructions set as well and got the following results (thumb is slower than normal ARM), also fixed some bug in test program which resulted in memory throughoutput statistics being slightly off, so the following results should be final now: # gcc -o test_colorconv -O2 -mthumb test_colorconv.c arm_colorconv.S # ./test_colorconv test: 'yv12_to_yuv420_xomap', time=9.493s, speed=25.394MP/s, memwritespeed=38.091MB/s test: 'yv12_to_yuv420_xomap_nobranch', time=8.516s, speed=28.306MP/s, memwritespeed=42.460MB/s test: 'yv12_to_yuv420_line_arm_', time=4.736s, speed=50.895MP/s, memwritespeed=76.343MB/s test: 'yv12_to_yuv420_line_armv5_', time=3.395s, speed=71.011MP/s, memwritespeed=106.517MB/s test: 'yv12_to_yuv420_line_armv6_', time=2.876s, speed=83.817MP/s, memwritespeed=125.726MB/s If you remember the information posted in [2], mplayer used 12 seconds for video output when playing Nokia_N800.avi (it contains the same number of frames of the same size as used in this test for benchmarking). Color format conversion code taken from xserver and compiled for thumb uses 9.5 seconds for doing the same amount of work. So now the results of the tests are consistent - when doing video output, most of ARM core cycles are spent in this 'omapCopyPlanarDataYUV420' function. Optimizing it using 'yv12_to_yuv420_line_armv6' will definitely provide a huge effect, video output overhead when using Xv will be at least halved providing more cpu resources for video decoding. > > That's fine. Now I'm waiting for further instructions :) Should I try to > > prepare a complete patch for xserver? I'm really interested in getting > > this optimization into xserver as it would help to play high resolution > > videos. If you have any extra questions about the code or anything > > else (for example I wonder what free license would be appriopriate > > for it), don't hesitate to contact me. > > If you wanted to prepare a complete patch for the server, that would be > great, as I don't have time to get to it right now (trying to finish off > the merge with upstream, among others). As for the license, just the > standard MIT boilerplate in hw/kdrive/omap/* is fine, but replace Nokia > Corporation/Daniel Stone with Siarhei Siamaskha, obviously. > > > I did not try to build xserver sources yet as I did not have enough time > > for that and xserver requires quite a number of build dependencies. Can > > you share some tips and tricks about maemo xserver development. Is it > > difficult to compile (do I need any extra build scripts, tools, or > > configuration options) and install on N800 (is it safe to upgrade > > xserver on N800 from .deb file)? > > It's completely safe to upgrade from a deb if it's not broken. If you > set up a standard Maemo build environment and run apt-get source > xorg-server and apt-get build-dep xorg-server, it should work just fine, > in theory. > > I don't have any tips, per se. Once I get it all integrated it'll be in > git, but for now, the only public source is the packages. OK, thanks. It may take some time though. I'm still using old scratchbox with mistral SDK here (did not have enough free time to upgrade yet). Until I clean up my scratchbox mess, I can only provide some patch without testing, if anybody courageous can try to build it :) > > I also tried to use YUV420 on Nokia 770, but it did not work well. > > According to Epson, this format should be supported by hardware. Also > > there is a constant OMAPFB_COLOR_YUV420 defined in omapfb.h in Nokia 770 > > kernel sources. But actually using YUV420 was not very successful. Full > > screen update 800x48
Toolchain upgrade? (Was: Instructions cache flush on ARM)
On Tuesday 24 April 2007 10:56, you wrote: > > By the way, do you have any plans for upgrading toolchain? Either I'm > > extremely unlucky, or current toolchain is really very buggy. > > You can see the known issues from the GCC bugzilla. > There are a few bugs in C++ support which have been fixed > in gcc 3.4.6 (Maemo toolchain is 3.4.4) or 4.x. But doesn't current maemo toolchain have lots of modifications to backport EABI support which only officially appeared in gcc 4.x? These modifications might have introduced some additional instability. > > It does not support -pg option properly (for profiling with gprof), > > Hm. The toolchain might not be built with -pg support. > As to using gprof, that produces fairly unreliable results. > I'd recommend building Oprofile kernel and latest oprofile > user-space tools. Maybe Oprofile is good, but gprof is better than nothing and does not require recompiling kernel. > > also I encountered at least one internal compiler error and a couple of > > invalid code generation bugs already. > > C++ code generation? Or C? (GCC bugzilla mentions only C++ > code generation issues) I have encountered the following problems on C code (MPlayer). ICE: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22177 Definitely invalid code generation in inline asm (but the same bug apparently shows up in gcc 4.1.1 as well): http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693 Invalid code generation suspected: https://garage.maemo.org/tracker/index.php?func=detail&aid=254&group_id=54&atid=269 https://garage.maemo.org/tracker/index.php?func=detail&aid=763&group_id=54&atid=269 I did not investigate these two last problems thoroughfully (this might be probably some bad code in MPlayer with 'undefined behaviour' which works better on some compilers but breaks on the others), but they disappear when compiling with gcc 4.1.1 crosscompiler (outside scratchbox using gentoo crossdev). > ICE you can get around by trying another optimization level > (sometimes -Os or -O3 works where -O2 doesn't). Well, I'm worried not about how to workaround ICE but about the overall quality of the compiler. I wonder how many compiler related bugs are lurking in maemo software but are not caught yet? But again, maybe I'm just unlucky to get hit by more bugs than the others :) Did anybody try installing newer toolchains in scratchbox and use them with maemo SDK? I just don't have much free time for these experiments and don't want to break my installation of scratchbox which works now (more or less acceptable) Building packages with new toolchain would probably need to have libstdc++ linked statically for C++ applications to work on 770/N800, but otherwise everything should be fine. > > One more question is about the kernel, ARM11 seems to support unaligned > > memory access in hardware, but this feature is not enabled on N800. > > What the "seems", "to support" and "feature enabled" mean in > the above clause? Seems how? And what is result? Enabled what? "seems" is a standard disclaimer which means that I did not work with these features myself, only read this information from docs and can't be sure if I understood everything correctly :) > ARM CPU is able to trap them? Kernel could SIGBUS the co. processes? > (as unaligned access has AFAIK undefined results on ARM, is often > coding error and "fixing" those accesses on kernel side has definitive > performance penalty) http://arm.com/documentation/ARMProcessor_Cores/index.html 'ARM1136JF-S and ARM1136J-S r1p1 Technical Reference Manual' Chapter 4 'Unaligned and Mixed-Endian Data Access Support' As ARM11 core used in N800 is little endian, does have floating point unit and supports unaligned memory access in hardware (which only needs to be enabled). It probably doesn't have any serious portability issues to be aware of anymore and vast majority of software initially developed for x86 should be easy to compile and run on it even without doing any modifications. Enabling unaligned memory support will make life much easier for developers unfamiliar with ARM platform. The number of applications for N800 should grow up, as less newbee developers will be turned away frustrated by the alignment bugs they have never heared about before. But this will be to some extent a bad thing for Nokia 770, as it will result in more applications usable on N800, but buggy on 770 ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 & Video playback
On Friday 27 April 2007 04:43, Daniel Stone wrote: > > I'll make a really optimized version of YV12 -> YUV420 convertor on this > > weekend (removing branch is good, but I feel that it can be improved > > more) and will try to use it on Nokia 770, any extra video performance > > improvement will be useful there. I hope that the framebuffer driver on > > Nokia 770 supports YUV420 color format properly. > > I don't think Tornado supports YUV420, but I can check in the specs > tomorrow. My better C version basically does two macroblocks at a time, > ensuring all 32-bit writes (which _really_ helps over 16-bit writes, > believe me). This eliminates the branch, since your surface is > guaranteed to be word-aligned, so if you do all 32-bit writes, you can > just drop the branch as you know every write will be aligned. > > This will be really fast. Optimized YV12 -> YUV420 convertor is done. The sources can be found here: https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer Take a look at 'arm_colorconv.h' and 'arm_colorconv.S' files. Also there is a test program ('test_colorconv') which can ensure that everything works correctly and fast: ~ $ ./test_colorconv test: 'yv12_to_yuv420_xomap', time=7.332s, speed=32.878MP/s, memwritespeed=43.838MB/s test: 'yv12_to_yuv420_xomap_nobranch', time=5.679s, speed=42.448MP/s, memwritespeed=56.597MB/s test: 'yv12_to_yuv420_line_arm_', time=4.706s, speed=51.223MP/s, memwritespeed=68.297MB/s test: 'yv12_to_yuv420_line_armv5_', time=3.356s, speed=71.824MP/s, memwritespeed=95.765MB/s test: 'yv12_to_yuv420_line_armv6_', time=2.826s, speed=85.298MP/s, memwritespeed=113.731MB/s ARMv6 optimized YV12->YUV420 convertor is about 2.5x faster than current code used in N800 xserver. So it should provide a nice improvement for video :) I doubt that your better C version can beat it or even get any close. There are two important optimizations in this code: 1. Cache prefetch with PLD instruction (added in '_armv5' version) which boosts performance to 70 megapixels per second. Inner loop is unrolled to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so such unrolling is convenient). This is the most important improvement. You can try using __builtin_prefetch() from C code to do the same optimization. 2. The use of ARMv6 instruction REV16 to do bytes swapping for high and low 16-bit register parts, this optimization was added in '_armv6' version and boosted performance even more to 85 megapixels per second. This optimization is highly unlikely probably impossible for C version at all. I was a bit wrong about YUV420 format in my previous post. Suppose we have planar YV12 image with the following data. Y plane: Y1 Y2 Y3 Y4 ... U plane: U1 __ U2 __ ... Normal YUV420 (according to pictures in Epson docs) would be the following: U1 Y1 Y2 U2 Y3 Y4 ... But appears (most likely because of 16-bit interface and some endian differences between ARM and Epson chip) that each pair of bytes is swapped and we actually get the following somewhat weird layout: Y1 U1 U2 Y2 Y4 Y3 ... To do this byteswapping, ARMv6 instruction REV16 is very handy. The assembly sources for ARMv6 code look a bit messy because instruction reordering was needed to correctly schedule them and avoid ARM11 pipeline interlocks which negatively affect performance. Now this code is really fast with very little or no interlocks in the inner loop. And gcc does not do a good job optimizing code on ARM, so C implementation would be also at disadvantage here. By the way, the benchmarks posted in my previous message should be discarded. I did not initialize source buffers that time and looks like ARM11 cpu has some 'cheat' which allows treating empty data pages in some special way and avoid reading from memory. So the numbers posted in the previous benchmark were higher than usual. Now it is corrected. As for the other possible Xv optimizations. You mentioned that fallback code is not important at all. But imagine 640x480 video playback in windowed mode. Decoding it will require quite a lot of resources, but additionally scaling it down using a slow fallback code will be a finishing blow. In addition, a solution (fast JIT accelerated YV12->YUY2 scaler) for this problem already exists. I can also modify this scaler to support YV12->YUV420 scaling. An interesting thing here is that this scaler could be also used by xserver to solve graphics bus bandwidth issues. Imagine that we have some high resolution video with high framerate which exceeds graphics bus capabilities. In this case this video can be downscaled in software using JIT scaler to lower resolution before sending data to LCD controller. What do you think? > Sure. Unfortunately my job has other functions than to make video > decoding really, really fast, so I'm happy to merge, review, offer > feedback, and help you out where I can be useful, but I can't throw much > time at this myself. That's
Re: N800 & Video playback
On Tuesday 24 April 2007 12:36, Daniel Stone wrote: > > My main performance concern is exactly about this > > 'omapCopyPlanarDataYUV420' function. My experience from Nokia 770 video > > output code optimization shows that optimization effect can be really > > huge (it was 1.5x improvement on Nokia 770 for unscaled YV12 -> YUY2 > > conversion going from a simple loop in C to optimized assembly code, I > > provided a link to the relevant code in my previous post). But N800 code > > can be probably improved more because now it contains unnecessary branch > > in the inner loop and branches are expensive on long pipeline CPUs. Such > > color format conversion performance should be comparable to that of > > memcpy if done right (it is about half memcpy speed on Nokia 770 for > > unscaled YV12 -> YUY2 conversion). > > Right, the branch is a problem, and as I said, the branch can be avoided > and the writes optimised to be three 32-bit writes for two macroblocks, > instead of two 32-bit writes and two 16-bit writes. I did not have much free time to do complete tests, but initial benchmarks show that actually even removing this branch and using three 16-bit writes improves performance quite significantly. The test program is here: http://ufo2000.sourceforge.net/files/yuv420test.c It produces the following results if compiled with optimization options "-O3 -fomit-frame-pointer -mcpu=arm1136j-s": # ./yuv420test test: 'yv12toyuv420_xomap', time=5.220, memory bandwidth=61.576MB/s test: 'yv12toyuv420_yv12toyuv420_branch_removed', time=3.503, memory bandwidth=91.754MB/s An interesting thing about this test is that it uses 2504 frames 400x240 each, that's the same number of frames as Nokia_N800.avi video has. And mplayer spent 12,365s on video output when playing this video while YV12->YUV420 conversion should have taken 5.220s as benchmarked in this test. So now color conversion is roughly half of the time spent on video output for this resolution. Some tests with higher resolution videos will be done later. As you see from the benchmark results, we can get 1.5x improvement already for color conversion with just a trivial removal of a piece of redundant code. Was that branch in the code supposed to improve performance? Seems like it resulted in quite the opposite effect. I'll make a really optimized version of YV12 -> YUV420 convertor on this weekend (removing branch is good, but I feel that it can be improved more) and will try to use it on Nokia 770, any extra video performance improvement will be useful there. I hope that the framebuffer driver on Nokia 770 supports YUV420 color format properly. By the way, does anybody know if it is possible to enable tearsync support on Nokia 770 (by backporting some changes from N800 kernel or in some other way)? > However, I don't think the lessons from the 770 are necessarily > _directly_ applicable to the N800: on the 770, our bottleneck is > decoding speed. The bottleneck on the N800 is exactly the opposite: > video output. I can't agree here. Memory speed is actually a lot faster on N800, the only trouble is graphics bus performance, but sending data to LCD controller through this bus does not introduce any load on ARM core and it can freely decode the next frame of video at the same time. At least this was the case with the previous version of firmware (I did not have enough time to see what was changed in framebuffer API and do any video tests with it). But color conversion is done by ARM core and it consumes precious cpu cycles which could be used for decoding higher resolution/bitrate video. Optimizing color conversion will improve video performance. The improvement will be most likely only within a few percents overall, but every little bit helps. > Bear in mind that, unless you explicitly disable it (the Xv attribute is > something like XV_OMAP_VSYNC), the X server _will_ flush all pending > writes before the next frame is put through. Else you get tearing, > because you can be halfway through an update, and writing the next frame > to the framebuffer, so which frame is being picked up, changes halfway > through. > > Try forcing XV_OMAP_VSYNC (or whatever it is) to 0, and comparing the > results. OK, thanks, I'll try this test too and check if it affects Xv performance. But I thought that using 12bpp color format _and_ sending only as much data as needed should solve the problem. Of course 800x480 * 16bpp * 30fps would be 23MB/s and it is too much. But for example 640x480 * 12bpp * 30fps = 12.3MB/s. Is the graphics bus fast enough to handle this? Or is there some other problem I'm not aware of? > > N800 is almost able to play VGA resolution videos properly, it only needs > > a bit more optimizations. Color format conversion performance for video > > output is one of the important things that can be improved. > > I don't believe it's on the critical path. The optimisation I mentioned > before will bring us up to the point where any impr
Re: N800 & Video playback
On Friday 20 April 2007 10:39, you wrote: > The primary conversion we do isn't planar -> packed (this is a fallback > for when the video is obscured), but from planar to another custom > planar format. It would be good to get ARM assembly for the fallback > path, but most of the problem when using packed lies in having to > transfer the much larger amount of data over the bus. It is only a problem of definition :) Whatever it is, packed or planar, this YUV420 format is not YV12. So it still needs conversion which is performed by only reordering bytes and is not much different from packed YUY2 (except that it requires less space and bandwidth). > There's one optimisation that could be done for the YUV420 conversion > (the custom planar format that Hailstorm takes), which removes a branch, > ensures 32-bit writes always (instead of one 32-bit and one 16-bit per > pixel), and unrolls a loop by half. Might be interesting to see what > effect this has, but I think it'll still be rather small. My main performance concern is exactly about this 'omapCopyPlanarDataYUV420' function. My experience from Nokia 770 video output code optimization shows that optimization effect can be really huge (it was 1.5x improvement on Nokia 770 for unscaled YV12 -> YUY2 conversion going from a simple loop in C to optimized assembly code, I provided a link to the relevant code in my previous post). But N800 code can be probably improved more because now it contains unnecessary branch in the inner loop and branches are expensive on long pipeline CPUs. Such color format conversion performance should be comparable to that of memcpy if done right (it is about half memcpy speed on Nokia 770 for unscaled YV12 -> YUY2 conversion). But only benchmarks can be a real proof, any premature speculations are useless and even harmful. Do you remember the times when nobody from Nokia believed that ARM core could be good for video decoding on 770? ;-) Testing with Nokia_N800.avi video on N800: # mplayer -benchmark -quiet -noaspect Nokia_N800.avi BENCHMARKs: VC: 29,525s VO: 15,029s A: 0,453s Sys: 59,919s = 104,925s BENCHMARK%: VC: 28,1390% VO: 14,3232% A: 0,4313% Sys: 57,1065% = 100,% BENCHMARKn: disp: 2511 (23,93 fps) drop: 0 (0%) total: 2511 (23,93 fps) Enabling direct rendering (avoids extra memcpy in mplayer, but requires to disable OSD menu): # mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi BENCHMARKs: VC: 29,826s VO: 12,365s A: 0,437s Sys: 60,555s = 103,182s BENCHMARK%: VC: 28,9058% VO: 11,9833% A: 0,4236% Sys: 58,6873% = 100,% BENCHMARKn: disp: 2504 (24,27 fps) drop: 0 (0%) total: 2504 (24,27 fps) Testing the same video on Nokia 770: # mplayer -benchmark -quiet -noaspect Nokia_N800.avi BENCHMARKs: VC: 44,982s VO: 7,998s A: 0,884s Sys: 47,936s = 101,801s BENCHMARK%: VC: 44,1862% VO: 7,8568% A: 0,8688% Sys: 47,0882% = 100,% BENCHMARKn: disp: 2502 (24,58 fps) drop: 0 (0%) total: 2502 (24,58 fps) So Nokia 770, having slower CPU, slower memory and using less efficient output format (16bpp vs. 12bpp), still requires less time for video output than N800 (7,998s vs. 12,365s). Graphics bus performance is unrelated here as it is asynchronous operation and it is fast enough. Surely N800 also has some extra overhead because of interprocess communication with xserver, but looks like YV12 -> YUV420 conversion is quite a bottleneck here too. It should be noted that while Nokia_N800.avi video has low resolution and N800 has no problems decoding and displaying it, our goal is higher resolution videos such as 640x480. Getting to higher resolutions will increase color format conversion overhead. As it can be seen from these benchmarks, video output on N800 takes quite a significant time when compared with time needed for decoding (29,826s for decoding, 12,365s for video output). I can make an assembly optimized code for YV12 -> YUV420 conversion. Is there any chance that such optimization could be also integrated into xserver in one of the next firmware updates if it really provides a significant performance improvement? N800 is almost able to play VGA resolution videos properly, it only needs a bit more optimizations. Color format conversion performance for video output is one of the important things that can be improved. > > So for any performance optimizations experiments which result in > > immediate video performance improvement, either direct framebuffer access > > should be used again or it would be very nice if xserver could provide > > direct access to framebuffer (video planes) in yuy2 and that custom > > yuv420 format in one of the next firmware updates. The xserver itself > > should not do any excess memory copy operations as they degrade > > performance (and it does such copy for yuy2 at least). > > 'Direct framebuffer access'? As in, just hand you a pointer to a > framebuffer somewhere and let you write straight to it? As this would > require a firmware update an
Re: Instructions cache flush on ARM (was: N800 & Video playback)
On Monday 23 April 2007 16:49, Guillem Jover wrote: > > You are right. gcc has function > > void __clear_cache (char *BEG, char *END) > > which should be more portable. > > It should, but it still has the problem of emitting an OABI syscall > due to our old gcc. > > You could use syscall(2) and __ARM_NR_cacheflush instead. Yes, but __clear_cache(char *BEG, char *END) works fine with the current combination of gcc and kernel in maemo. So I guess it's the best option if portability is desired. If you decide to drop support for old ABI from kernel without upgrading gcc, that would be a bug in maemo platform :-) By the way, do you have any plans for upgrading toolchain? Either I'm extremely unlucky, or current toolchain is really very buggy. It does not support -pg option properly (for profiling with gprof), also I encountered at least one internal compiler error and a couple of invalid code generation bugs already. One more question is about the kernel, ARM11 seems to support unaligned memory access in hardware, but this feature is not enabled on N800. Is it done for consistency with Nokia 770? ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Instructions cache flush on ARM (was: N800 & Video playback)
On Friday 20 April 2007 19:04, you wrote: > > I have seen your code in xserver which does the same job for downscaling, > > but in nonoptimized C and with much higher impact on quality. Using JIT > > scaler there can improve both image quality and performance a lot. The > > only my concern is about instruction cache coherency. As ARM requires > > explicit instructions cache flush for self modyfying or dynamically > > generated code, I wonder if using just mmap is safe (does it flush cache > > for allocated region of memory?). Maybe maemo kernel hackers/developers > > can help with this information? > > arm linux support flush icache by syscall "cacheflush", > > qemu have this function: > static inline void flush_icache_range(unsigned long start, unsigned long > stop) > { > register unsigned long _beg __asm ("a1") = start; > register unsigned long _end __asm ("a2") = stop; > register unsigned long _flg __asm ("a3") = 0; > __asm __volatile__ ("swi 0x9f0002" : : "r" (_beg), "r" (_end), "r" > (_flg)); > } > > you can reference kernel source arch/arm/kernel/traps.c and > include/asm-arm/unistd.h Thanks, it works. But I'm worried about [1]. Looks like EABI has a new syscall interface and this code from qemu uses old ABI. And from reading description at the wiki page, compatibility with old ABI can be disabled (and it makes sense disabling it as this compatibility reduces performance a bit). I wonder if there is a better portable solution (running on any ARM linux or even better on any POSIX compatible system). It would be reasonable to assume that allocating memory with mmap implies that we are going to execute code from that area and instructions cache should be flushed for it: mmap(0, some_buffer_size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); But I wonder if mmap requesting executable block of memory really does instructions cache flush in reality? I just want to submit this ARM optimized scaler to upstream ffmpeg and want to make it as portable as possible. 1. http://wiki.debian.org/ArmEabiPort#head-96054c6cb4209b4a589e645dd50ac0fe133b8ced ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 & Video playback
On Monday 19 March 2007 22:34, you wrote: > Again, if there are any particular questions I can answer, don't be > subtle: ask me straight up. If I can answer them (some things I can't > necessarily say, some things I don't necessarily know), I will. Thanks, here we go and sorry for a long delay with this answer. First thanks for Xv update which makes it really usable now, MPlayer now uses Xv video output on N800 by default. But there are still some problems. Using unmodified upstream MPlayer code for Xv (N800 with 3.2007.10-7 firmware at the moment) does not work good. It has two at least problems: 1. Lockups which look like cycling two sequential frames, very similar or the same problem as https://maemo.org/bugzilla/show_bug.cgi?id=991 Also keypresses are not very responsive. A fix (or workaround) required changing XFlush to XSync in screen update code, now it looks a lot better. 2. Switching windowed/fullscreen mode generally makes mplayer terminate with the following error messages: "X11 error: BadValue (integer parameter out of range for operation)" "Xlib: unexpected async reply (sequence 0x5db)!" A workaround to make this problem less frequent was a code addition which prevents screen updates until we get Expose even notification. All these Xv patches for MPlayer code can be viewed here: https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayer&diff_format=h&view=rev&rev=166 I really don't know much about X11 programming and only started to learning it, so your help with some advice may be very useful. Looks like MPlayer code X11/Xv output code is a big mess with many tricks and workarounds added to work on different systems over time. Maybe it contains some bugs which get triggered on N800 only, but apparently this code is used for other systems without any problems. Can you try experimenting a bit with MPlayer (upstream release) yourself to check how it works with N800 xserver? Maybe it can reveal some xserver bugs which need to be fixed? Also if MPlayer has some apparently bad X11 code, preparing a clean patch and submitting it upstream maybe a good idea. One more strange thing with Xv on N800 can be reproduced by trying to watch standard N800 demo video in MPlayer. It has an old familiar tearing line in the bottom part of the screen and the performance is very poor. The same file plays fine in the standard video player. The only difference is that mplayer respects video aspect ratio (this video is not precisely 15:9 but slightly off) and shows some small black bands above and below picture and default video player scales it to fit the whole screen. Disabling aspect ratio in mplayer with -noaspect option also 'fixes' this problem. Using benchmark option we get the following numbers: # mplayer -benchmark -quiet Nokia_N800.avi [...] BENCHMARKs: VC: 33,271s VO: 66,768s A: 0,490s Sys: 5,703s = 106,232s BENCHMARK%: VC: 31,3189% VO: 62,8517% A: 0,4614% Sys: 5,3681% = 100,% BENCHMARKn: disp: 1732 (16,30 fps) drop: 778 (30%) total: 2510 (23,63 fps) # mplayer -benchmark -quiet -noaspect Nokia_N800.avi [...] BENCHMARKs: VC: 32,226s VO: 14,350s A: 0,456s Sys: 55,699s = 102,731s BENCHMARK%: VC: 31,3694% VO: 13,9687% A: 0,4439% Sys: 54,2180% = 100,% BENCHMARKn: disp: 2501 (24,35 fps) drop: 0 (0%) total: 2501 (24,35 fps) So when showing video with proper aspect ratio, we get tearing back and more than 4x slowdown in video output code (66,768s vs. 14,350s). This all results in 30% of frames dropped. These were the 'usability' problems with Xv. Now we get to performance related issues. As YV12 is not natively supported by hardware, some color format conversion and bytes shuffling in video output code is unavoidable. It is a good idea to optimize this code if we need a good performance for high resolution video playback. Color format conversion can be optimized using assembly, for example maemo port of mplayer has a patch for assembly optimized yv12-> yuy2 (yuv420p -> yuyv422) nonscaled conversion which provides a very noticeable ~50% improvement on Nokia 770: https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayer&rev=129&view=rev Also here is a JIT accelerated scaler for yv12-> yuy2 (yuv420p -> yuyv422) conversion, it is very fast and supports pixels interpolation (good for image quality) : https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer I have seen your code in xserver which does the same job for downscaling, but in nonoptimized C and with much higher impact on quality. Using JIT scaler there can improve both image quality and performance a lot. The only my concern is about instruction cache coherency. As ARM requires explicit instructions cache flush for self modyfying or dynamically generated code, I wonder if using just mmap is safe (does it flush cache for allocated region of memory?). Maybe maemo kernel hackers/developers can help with this information? It should be noted, that all this assem
Re: N800 & Video playback
On Tuesday 20 March 2007 15:03, Klaus Rotter wrote: > > On Tue, Mar 20, 2007 at 09:31:00AM +0100, ext Klaus Rotter wrote: > >> The memory bandwidth to the N800 LCD framebuffer is 3 times slower that > >> the bandwidth in the N770? Is it really _that_ big? > > > > Siarhei's calculations were correct, so, yes. > > Bad... the N770 interface wasn't the fasted either. So we have even a > more slow down. There is one important thing to note. Screen updates are asynchronous and are performed simultaneously with CPU doing some other useful things at the same time. Screen updates do not introduce any overhead or affect performance (at least I did not notice any such effect). So insanely boosting graphics bus performance will not provide any improvements at all once it is capable to sustain acceptable framerate. And what is acceptable depends on applications. Video may require higher framerate, but it is both high resolution and high framerate movies that may exceed graphics bus capabilities, in this case video will be still played (if cpu is fast enough to decode it, that's another story) but with some frames skipped and many people will not even notice any problems. Quite a lot of people are even satistied with 15fps transcoded video, so getting maybe 20-25fps (random guess) on some videos instead of 30fps is not so bad. Tearing at the bottom is most likely caused by screen update time being longer than two LCD refresh cycles. With tearsync enabled, both screen update and refresh cycle start at the same time, refresh is faster, so we still see the previous frame on the screen. When the first refresh cycle completes, screen buffer is slightly less than half updated at that moment. The second LCD refresh cycle starts displaying the data from the new image, while screen buffer still continues to get updated, but not fast enough to complete before this second LCD refresh cycle catches up not too far from the bottom part of the screen. If the screen update was faster than two refresh cycles, there would be no tearing visible. Screen update only needs to be 15-20% faster to achieve this. If improving graphics bus performance does not work, I wonder if it is possible to to reduce LCD refresh rate instead? Anyway, I think it is better to believe Daniel and wait for the new firmware update :) > On the N770 there was the feature (with SDL games) of > doubling the pixels by hardware with a X-server extension. Will this > feature be available in the new kernel / X11 server for the N800? It > would be great if it would use the same API. Doubling pixels will definitely reduce the load on the graphics bus so that its bandwidth should become not an issue. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: Support for frame buffer in N800
On Wednesday 21 March 2007 09:58, Sampath Goud wrote: > I want to know if there is frame buffer support in N800. > I have written a simple application (drawing a pixel) on frame buffer and > tried to execute it on N800 in root mode. > But it prompts the message "permission denied". > Please let me know if there is support for frame buffer in N800. If there > is support then how can I use it? Yes, it is available. You should use /dev/fb0 if you want RGB color format (/dev/fb1 and /dev/fb2 are used for video planes generally in YUV color format). But do you absolutely need to use framebuffer? Using framebuffer directly introduces a lot of problems synchronizing access to it with xserver (and it can't be done in a completely right way currently as far as I know). ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
N800 & Video playback
Hello All, I did some tests with the framebuffer when trying to find a way to reduce tearing effect in MPlayer. Here are the results. I did a mistake when I assumed that screen updates are synchronous for video planes. They are actually asynchronous just like with Nokia 770, but just a lot slower so that it is not possible to keep framerate in real time. So we get blocking when trying to display the next frame and the previous screen update is still in process. If we look at the framebuffer API. There are two ioctl important for screen updates and tearing synchronization if I understand them correctly now: * ioctl(fd, OMAPFB_UPDATE_WINDOW, &update); This ioctl initiates asynchronous screen update from a 'framebuffer' memory (actually that's just a system memory) to a graphics chip. If a previous screen update request is still incomplete, this call blocks and waits until it is done. The structure 'update' can have OMAPFB_FORMAT_FLAG_TEARSYNC flag set which instructs the framebuffer driver to wait internally (ioctl call is not blocked) and start data transfer on the start of the next LCD internal screen refresh (aparently refresh rate is something ~60Hz, but the numbers are below). * ioctl(fd, OMAPFB_SYNC_GFX); This ioctl call ensures that current screen update is done (blocking until data transfer is complete). Perforrming both these ioctls consequently we can benchmark screen update performance. The results are the following. On N800 (OS2007 2.2006.51-6), every YUV screen update (OMAPFB_COLOR_YUY422) takes about 41ms without tearsync enabled and 41-58ms with tearsync. It does not matter what video resolution we try to watch, the result is the same. So the maximal screen update rate is about ~24fps without tearsync and ~20fps on average with tearsync. That's not enough to watch 30 fps video and achieving 24 fps is theoretically possible, but very tricky and unrealistic. If tearsync comes into action, watching full framerate videos is impossible now. Analyzing the difference in screen update times with vsync, looks like full cycle of LCD internal refresh takes ~17ms (that's ~60Hz, but as the precision is not good, that may be something else, 50Hz for example). Nevertheless screen update on N800 can't be completed for these 17ms and tearsync does not work perfectly (most likely it can fill the screen up to the bottom horizontal line observed on playing video). If we try RGB screen updates, we can see that the time needed for screen update gets lower for updating smaller screen regions). The numbers are the following (without tearsync enabled): 640x480: 33.7ms 400x230: 10.2ms 320x240: 8.5ms Of course RGB screen updates are not very suitable for video as we would lose much more time doing YUV->RGB conversion. If we benchmark the screen update performance on Nokia 770, the numbers are: 640x480: 11.1ms 320x240: 2.9ms (that's fullscreen playback with pixel doubling) If we estimate bus performance on Nokia 770, it is ~55MB/s and is more than enough to display 800x480 sized video frames with 30 fps. Adding a tearsync would be a nice addition, as 11.1ms for 640x480 screen update time is lower than 17ms LCD refresh cycle. And in the worst case of video sync when we get 11.1ms+17ms=28.1ms for a single frame, it will be still capable of displaying 35 fps at the very least. So any resolution video can be played with perfect quality given enough cpu performance for video decoding (that's a real bottleneck on Nokia 770). Looks like graphics bus on N800 is 3x slower than on Nokia 770. It might be caused by inefficient framebuffer driver implementation in its initial revision. But if it is a hardware issue, getting normal video playback at native framerate may be troublesome. Performing software downscaling of video before sending data to the graphics chip may be a solution, but it sacrifices image quality. Switching to 12bit YUV format from 16bit will save ~33% of bus bandwidth, but it can't compensate 3x performance regression and may be not enough for 30 fps fullscreen video playback. Right now, I can workaround tearing somewhat, but some frames will have to be skipped, resulting in somewhat jerky playback (even for transcoded video unless framerate is halved). Apparently the same issue applies to emulators, games and other software. As Daniel explained, the next firmware will bring a big improvement in this area. I'm not sure whether it is worth to release the next version of MPlayer before that, since it will still be far from perfect on N800. A preview of the next kernel for beta testing might reduce time needed to get MPlayer fully working on N800, but I'm not demanding or expecting anything. It is just a matter of time anyway and I'm not so impatient :) I would be grateful for any comments and corrections. Some things are not yet clear to me, figuring them out myself is just a waste of time that could be spent on something more useful. Even a small hint may save a huge amount of time. PS.
Re: DVD content playback possible or not? Re: Wishlist (was:Re: N800 and USB host mode)
On Saturday 10 March 2007 01:57, Daniel Stone wrote: > On Fri, Mar 09, 2007 at 10:34:52PM +0200, ext Siarhei Siamashka wrote: > > On Friday 09 March 2007 12:20, Daniel Stone wrote: > > > Not really. The next firmware release has gone to great lengths to > > > improve video performance by doing scaling on the LCD controller, as > > > well as the colourspace conversion. I think you'll be pleasantly > > > surprised. ;) > > > > Thanks, that's a very good news. We all are looking forward for this > > firmware update. By the way, is it possible to get an early access to the > > updated kernels in the future for the purpose of testing and ensuring > > compatibility? > > It's a kernel and large X server update. Unfortunately I'm not in a > position to be able to release them to the public. That's why I asked this question in the mailing list. I hope that somebody in a position to make such decision is reading it. Nokia did some beta releases of OS2006 before, so maybe it could be possible to continue this tradition? > > N800 is a bit different with a more complicated framebuffer driver with a > > support for more hardware features (such as a very high quality hardware > > scaler), but its graphics chip does not seem to support planar YUV color > > formats, so something else (ARM core?) should do the conversion wasting > > the same ~20% of resources. By the way, did you consider trying to use > > DSP at least for unscaled planar->packed color format conversion? It > > should provide some improvement at least theoretically. > > The LCD controller takes in a planar format, so we indeed avoid that > conversion. The bottleneck, though, isn't CPU or memory load, but the > bus between the display controller and the LCD controller. So it > doesn't matter where we do the conversion, we just have to minimise the > load. Sending 12bpp (i.e., pre-scaled) video over instead of 16bpp > post-scaled is obviously a pretty huge win. I'm not quite sure what do you mean by 'LCD controller' and 'display controller'. Which one of them is the Epson chip? Is there anything done by OMAP chip in this scheme? OK, looks like I'll have to take a look at the sources when they get released. The matter where we do the conversion is actually very important, if it is done by ARM core, we'll have less cpu resources available for decoding video. Sending 12bpp will be surely a win, I just preferred 16bpp packed YUV format on Nokia 770 because it used the same layout as RGB565 and could be placed into the same buffer. As N800 supports different planes for video, this is not an issue anymore and using 12bpp format should be fine. > The X server does all this for you -- the semantics are, uhm, > 'nightmarish'; the LCD controller can't do colourkeyed video, only a > single cliprect. The Xv support already has this worked out, including > automatic migration of your videos when a menu gets popped out or > whatever. And it quite rightfully expects that it's the only thing > managing the framebuffer, so your planes may well get stomped. You > really want to use it. > > (Is there any special reason why you want to do it directly? If so, let > me know, and I'll see if I can introduce support for what you're trying > to do in the X server.) Yes, I really want to use X server and Xv. But if using it sacrifices performance a lot, we just have to look at some other options. I'll have a look at Xv in the coming firmware update and if it is good enough, I'll be happy to drop a hack using direct framebuffer access in MPlayer for N800. It is too early (or too late) to discuss it, but maybe some kind of Xsp extension for video support to precisely match hardware capabilities could be developed? If LCD controller has problems with colourkeyed video, that's ok, not everyone needs it. If we need to make a choice whether to sacrifice compatibility or performance, I myself would prefer to keep good performance. That's why I'm still experimenting with framebuffer. > > CPU performance for video > > decoding is still another bottleneck. It is even worse bottleneck than > > video output as you can skip displaying of some frames, but you can't > > skip decoding. > > We aren't able to hit a situation where the CPU is an absolute > bottleneck, except maybe with some absurdly complicated codec. I > haven't seen this arise yet. Hanno Zulla already raised this issue, an example of such absurdly complicated codec is MPEG2. One more example is DIVX, I would like to ensure that all the video samples from the following page can be played smoothly: http://www.divx.com/movies/browse.php?category
Re: DVD content playback possible or not? Re: Wishlist (was:Re: N800 and USB host mode)
On Friday 09 March 2007 12:20, Daniel Stone wrote: > On Fri, Mar 09, 2007 at 09:45:03AM +0100, ext Hanno Zulla wrote: > > > Right now, the biggest bottleneck in video decoding is RFBI bandwidth > > > (i.e. the bus between OMAP and the LCD controller we use), being too > > > slow to push more than ~15fps through at 800x480. Beefing up the > > > processor-side decoding doesn't help. We've been working on this and > > > the next firmware update will give you significantly faster video (with > > > a couple of caveats). > > > > > > So it's mostly just down to the large image display, which more or less > > > suffers from the same problem. I don't think it would give us much > > > benefit at all. > > > > So from the hardware side, it is definitely no-matter-what-you-try > > impossible to play DVD video content on the N800, even if there was help > > from the DSP? > > Not really. The next firmware release has gone to great lengths to > improve video performance by doing scaling on the LCD controller, as > well as the colourspace conversion. I think you'll be pleasantly > surprised. ;) Thanks, that's a very good news. We all are looking forward for this firmware update. By the way, is it possible to get an early access to the updated kernels in the future for the purpose of testing and ensuring compatibility? I wonder what improvements the new framebuffer driver will bring to us. As far as I understand the situation with the current firmware, the problem is in having to do planar->packed YUV conversion at ARM core and synchronous screen update for anything involving planes. Graphics system in Nokia 770 could perform YUV screen updates asynchronously with DMA consuming only ~20% cpu resources for 640x480 24 fps video output (these ARM core resources were used for planar->packed color format conversion and scaling). N800 is a bit different with a more complicated framebuffer driver with a support for more hardware features (such as a very high quality hardware scaler), but its graphics chip does not seem to support planar YUV color formats, so something else (ARM core?) should do the conversion wasting the same ~20% of resources. By the way, did you consider trying to use DSP at least for unscaled planar->packed color format conversion? It should provide some improvement at least theoretically. And a few questions about the future frambuffer driver. I know that the pixel doubling feature should be fixed in the next firmware. Will this driver also support YUV color format for regular screen updates (without using planes) just like N770? I would prefer some kind of stateless API that would not allow to screw up the device when something gets wrong (having some planes enabled at abnormal exit makes it impossible to work with the device and requires a reboot). And one more minor question is about YUV format constants in framebuffer. OMAPFB_COLOR_YUV422 constant for N770 specifies the same color format as OMAPFB_COLOR_YUY422 for N800, why did you have to introduce a new constant? All in all, while video output issues can be solved, CPU performance for video decoding is still another bottleneck. It is even worse bottleneck than video output as you can skip displaying of some frames, but you can't skip decoding. The latest build of mplayer for maemo (mplayer_1.0rc1-maemo.10) accesses framebuffer directly, so its video output performance is comparable to that of N770. Unfortunately while cpu usage for video output reduced greatly to a reasonable level and is not a bottleneck anymore, video decoding performance is still a bottleneck and N800 is only about 30% faster than N770 for video (N800 handles 30fps videos in mplayer approximately the same as N770 handles 24fps videos). Surely, armv6 optimizations for video decoding can provide some improvement, but we have a long way of incremental improvements ahead. Did you try to do something about tearing in the next firmware? While I tried to workaround it, nothing could eliminate it completely but only resulted in some additional slowdown. So the latest build of mplayer has tearsync completely disabled and is optimized for performance only. It goes without saying that we will have to do something about it in the the future for sure. Is IVA really unusable on N800? What kind of cpu does it have inside? If it is done by TI, we can probably suppose that it is TMS320C64x (at least I have seen information that IVA2 is a lower clock and more power efficient version of DaVinci which uses TMS320C64x). ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Xvideo support for Nokia 770?
Hello, It would be probably a good idea to discuss different possibilities for improving multimedia support on 770/N800. Now we have a fast JIT scaler that runs on ARM core, it solves all the video resolution related performance problems. I'm going to work on improving quality, performance and its inclusion into upstream ffmpeg library, this task is in my nearest plans: http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2007-January/051209.html As for the ways of improving multimedia support on Nokia 770, it may be done in the following ways (in no particular order): 1. Continue ffmpeg optimizations (motion compensation functions, finetune idct, have a look at the possibilities to optimize codecs other than mpeg4 and its variants) 2. Implement Xvideo extension support for Nokia 770 (using scaling done on ARM core) 3. Implement XvMC in some way (using C55x DSP for it as it is supposedly good for IDCT and motion compensation stuff) 4. Improve GStreamer plugins (replacements for dspfbsink and dspmpeg4sink running on ARM core, it could probably improve mpeg4 playback performance a lot and allow using higher video bitrates and resolutions that are currently available in MPlayer) 5. Try to relay color format conversion and scaling to DSP. If it works as expected, video scaling can be done with almost zero overhead for ARM core. Theoretically the same trick could probably also work for GStreamer if video output sink can provide its own buffer (::buffer_alloc). The first step would be to try just doing nonscaled color format conversion. If it is successful, some more advanced stuff can be tried such as JIT dynamic code generation on C55x. 6. Try porting vorbis decoder (tremor) to DSP 7. Try porting libmpeg2 to DSP. With audio decoding and scaling done on ARM core, it might improve overall mpeg2 playback performance, I wonder if nonconverted DVD video playback is even theoretically possible on Nokia 770. That's quite a big list and it contains some things that might be generally nice to have, but have relatively low practical value and are actually not worth efforts implementing :) There are two issues that need to be solved for this all to become reality: 1. We need some way of applying community developed upgrades for core system components such as xserver and xlib (if we go after Xvideo support on Nokia 770). They must be easy to install by end users, otherwise this all development does not make much sense. It would be also nice to integrate these improvements into official firmware later, but I wonder if Nokia has spare resources for doing this integration and its quality assurance. 2. Reliable information that is detailed enough for performing graphics and audio output from DSP, see http://maemo.org/pipermail/maemo-developers/2007-February/007949.html ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
[maemo-developers] Fast 16bpp alpha blending (was: Improving Cairo performance on the N800)
On Thursday 18 January 2007 13:46, Gustavo Sverzut Barbieri wrote: > > By the way, free software is really poorly optimized for ARM right now. > > For example, SDL is not optimized for ARM, xserver is probably not > > optimized as well, a lot of performance critical parts of code in various > > software are still only implemented in C for ARM while they have x86 > > assembly optimizations long ago. Considering that Internet Tablets might > > have a tight competition with x86 UMPC devices in the near future, ARM > > poweded devices are at some disadvantage now. Is this something that we > > should try to change? :-) > > Yes. Since at INdT we use a lot of SDL, GTK and in future Evas, we are > interested in optimizing this. > > One thing that can be optimized is 16bpp operations. Moving SDL > surfaces to be optimized, packing 16bpp RGB into one plane and 1 > byte-Alpha in another plane, we could use multiple store (stm) and > improve things a bit. > > If we could achieve ~24fps blitting fullscreen 16bpp+Alpha, it would > rock! :-) Right now we do 18fps, but we still need that function with > separated planes + stm. I'll ask Lauro to send them as soon as we get > it working. > > Anyone willing to help evas port to work with 16bpp+Alpha internally? > Evas is a great canvas, can interoperate with Glib main loop easily > and provides high level utilities, like text layout (pango-like), > gradients, the concept of objects to animate and is scriptable really > easy (with optimizations!). Regarding 16bpp alpha blending, I did some optimization (not involving assembly yet) for maemo build of ufo2000 [1]. The code which is currently used, is based on and extends RLE sprites from Allegro game programming library [2]. The sources can be found here: http://ufo2000.svn.sourceforge.net/viewvc/ufo2000/trunk/src/fpasprite/ The goal was to get support for drawing isometric tiles with the support of alpha channel (for fire, smoke, explosions, window glass, ...) and adjustable brigtness (for lighting effects on night missions simulation). The code works as allegro addon library and allows loading sprites from PNG files. It automagically detects presence or absence of alpha channel and converts images into optimal format which allows fast blending (for alpha channel) and store it in a compact form (for images without alpha channel). When blitting sprite, brightness ranging from 0 - 255 is used as an additional argument. The code uses C++ templates to support all the possible variants of bit depth and blending type (may be not a very good idea for the code intended for submission into C library later :) ). The trick used to speed up alpha blending was to store each pixel data in a special 32-bit representation with R, G, B and alpha channel arranged in a special way for better performance. So imagine that we have 16-bit pixel in RGB565 format and alpha channel. We convert it into this 32-bit preprocessed data according to the following algorithm: uint32_t convert_pixel(uint16_t rgb565, int alpha) { uint32_t n = (alpha + 1) / 8, x = rgb565; x = (x | (x << 16)) & 0x7E0F81F; return x | (n << 5); } Now if we need to do alpha blending (with some buffer in memory), we do the following (d - destination pixel data buffer, s - buffer with preprocessed 32-bit pixel data, w - number of pixels to blend, n - brightness level (0 - 32)): uint16_t *draw_alpha_dark_line16(uint16_t *d, uint32_t *s, int w, uint32_t n) { while (--w >= 0) { uint32_t x = *s++; uint32_t y = (uint16_t)*d; uint32_t result = (x >> 5) & 0x3F; x = ((x & 0x7E0F81F) * n / 32) & 0x7E0F81F; y = (y | (y << 16)) & 0x7E0F81F; result = ((x - y) * result / 32 + y) & 0x7E0F81F; *d++ = (result | (result >> 16)); } return d; } This code works quite fast (at the cost of some precision loss though). It is perfectly suitable for isometric tile based games and probably other applications which only need lightning fast blending and do not need any extra operations with sprites (rotation for example). Removing brightness level support makes the code even faster. Using this code in ufo2000 allows it to keep reasonably high framerate (more than 10 fps) even on complicated scenes full of fire and smoke animation for example. I hope this information may be useful for other maemo game developers or anyone in need of fast 16bpp alpha blending code. PS. Optimizing alpha blending using assembly can most likely improve performance even more :) [1] http://ufo2000.sourceforge.net [2] http://alleg.sourceforge.net ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Xvideo support for Nokia 770?
On Wednesday 10 January 2007 01:51, Charles 'Buck' Krasic wrote: > Siarhei Siamashka wrote: > > Actually I have been thinking about trying to implement Xvideo > > support on 770 for some time already. Now as N800 has Xvideo > > support, it would be nice to have it on 770 as well for better > > consistency and software compatibility. > > As you may recall, I was considering this back in August/September. > I tried a few things, and reported some of my findings to this list. > The code for all that is still available here: > http://qstream.org/~krasic/770/dsp/ Yes, sure I remember. Thanks for doing these experiments and making the results available. It really helps to have more information around. > > I see the following possible options: > > > > 1. Implement it just using ARM core and optimize it as much as > > possible (using dynamically generated code for scaling to get the > > best performance). Is quite a straightforward solution and only > > needs time to implement it. > > It is my impression that this might be the most attractive option. > I noticed that TCPMP which seems to be the most performant player for > the ARM uses this approach, and it is available under GPL, so it may > be possible to adapt some of its code. > > In the long run, I would hope that integrating TCPMP scaling code into > libswscale of the ffmpeg project might be the most elegant approach, > since that seems to be the most performant/featureful/widel adopted > open-source scaling code (but not yet on ARM). For mplayer, it works > out of the box, since libswcale actually originated from mplayer, and > only recently migrated to ffmpeg. I see, thanks for the information (I checked TCPMP sources some time ago, but was interested in runtime cpu capabilities detection code and did not look at the scaler that time). Using TCPMP code may be an interesting option. But I also still may try to make my own scaler implementation for two reasons: 1. TCPMP is covered by GPL license, and most parts of ffmpeg are LGPL, so probably it makes sense making a clean room implementation of JIT powered scaler for ARM under LGPL license 2. I'm worried about the performance. Knowing how the cache and write buffer work on arm926 core, it is possible to tune generated code for it and get the best performance possible. So the results can be better than for TCPMP. I have just committed some initial assembly optimizations for unscaled yuv420p -> yuyv422 color format convertor to maemo mplayer SVN. It already provides some performance improvement, for example on my test video file (640x480 resolution, 24 fps) I get the following results now: BENCHMARKs: VC: 114.526s VO: 21.055s A: 0.000s Sys: 1.582s = 137.163s BENCHMARK%: VC: 83.4962% VO: 15.3503% A: 0.% Sys: 1.1535% = 100.% We can compare it with the older results (decoding time was also improved a bit since that time because of recent assembly optimizations for dequantizer): http://maemo.org/pipermail/maemo-developers/2006-December/006646.html BENCHMARKs: VC: 121.282s VO: 31.538s A: 0.000s Sys: 1.577s = 154.397s BENCHMARK%: VC: 78.5517% VO: 20.4267% A: 0.% Sys: 1.0216% = 100.% Most of the speed improvement in color conversion and video output (VO: part) is gained just from loop unrolling and avoiding using some extra instructions as gcc does when compiling C code, but using STMD instruction to store 16 bytes at once at aligned location [1] provides at least 10% performance here. If we estimate memory copy speed here with additional colorspace conversion applied, it is about 70MB/s now for 640x480 24 fps video (though we need to read a bit less data than write here, so it is a bit different from memcpy). And I have observed peak memcpy performance about 110MB/s on Nokia 770. So this color convertor is quite close to memory bandwidth limit now. This code can be optimized more by processing two image lines at once, so we can get rid of some data read instructions and improve performance. Also experimenting with prefetch reads may provide some improvement. JIT generated code should have a bit worse performance, but not much. It we decide to make 'nearest neghbour' scaling, the result should be probably as fast as this nonscaled conversion. But I want to try some simplified variation of bilinear scaling: each pixel in the destination buffer is either a copy of some pixel in the source buffer or an average value of two pixels. This way it should only introduce two extra instructions for each byte in output at maximum: addition of two pixel color components and right shift. > > 2. Try using dsp tasks that already exist on the device and are > > used for dspfbsink. But the sources of gst plugins contain code > > that limits video resolution for dspfbsink. I wonder if this check > > was introduced artifi
Re: [maemo-developers] Improving Cairo performance on the N800
On Tuesday 16 January 2007 12:08, Zeeshan Ali wrote: > > Now, the recently announced Nokia N800 is different from the 770 in > > various ways that are interesting for Cairo performance. I've got my > > eye on the ARMv6 SIMD instructions and the PowerVR MBX accelerator. > >Yeah! me too. The combined power of these two can make it possible > to optimize a lot of nice free software out there for the N800 device. > However! while former is fully documented and the documentation is > available for general public, it doesn't have a lot to offer. ARMv6 > SIMD only operate on 32-bit words and hence i find it unlikely that it > can be used to optimize double fp emulation in contrast to the intel > wirelesss MMX, which provides a big bunch of 128-bit (CORRECTME: or > was it 64- bit?) SIMD instructions. OTOH, these few SIMD instructions > can still be used to optimize a lot of code but would it be a good > idea for cairo if you need to convert the operand values to ints and > the result(s) back to float? Well, OMAP2420 seems to support floating point in hardware, so all this stuff is probably not needed anymore :) > I have already been thinking on utilizing ARMv6 before the N800 was > release to public. My proposed plan of attack for the community (and > also the Nokia employees) is simply the following: > > 1. Patch GCC to provide ARMv6 intrinsics. (1 MM at most) > 2. Patch liboil [1] to utilize these intrinsics when compiled for > ARMv6 target (1-3 MM) > 3. Make all the software utilize liboil wherever appropriate or ARMv6 > intrinsics directly if needed. > >The 3rd step would ensure that you are optimizing your software for > all the platforms for which liboil provides optimizations. OTOH! one > can skip step#1 and write liboil implementations in assembly. > >I already did a little progress on this and the result is two > header files which provides inline functions abstracting the assembly > instructions. I am attaching the headers. One of my friend was > supposed to convert them to gcc intrinsics and patch gcc but i never > got around to finish them. However I am attaching the headers so > anyone can use it as a starter if he/she likes. According to my tests, performance improvement from using such header files is minimal. They are easy to use, but the improvement is generally not very good. When I benchmarked idct performance, I also tested C implementaion with some macros for fast armv5te 16-bit multiplication out of curiasity. Performance improvement was only about 5%. While at the same time, handcrafted code improves performance by as much as 50% (and still has potential for more optimizations): http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2006-September/045837.html The very similar minimal effect is obtained from using such macros in ffmpeg mp3 decoder. The explanation is simple. Compiler is not able to shedule instructions as good as human especially if it has some 'alien' parts of code inserted in the flow of its instructions via inline asm. For example, this multiply instruction takes 1 cycle to execute, but the result has 1 extra cycle latency (for ARM9, it is even higher for ARM11 and is equal to 2 cycles) and you can't use it immediately in the next instruction. As gcc does not know about the sheduling of such instructions when using just macros, it may try to use the result immediately and suffer form 1 or more cycles penalty because of pipeline interlock. So if really good performance is required, nothing can beat handcrafted assembly yet. Of course it makes sense to profile code and optimize only time critical relatively small leaf functions. By the way, free software is really poorly optimized for ARM right now. For example, SDL is not optimized for ARM, xserver is probably not optimized as well, a lot of performance critical parts of code in various software are still only implemented in C for ARM while they have x86 assembly optimizations long ago. Considering that Internet Tablets might have a tight competition with x86 UMPC devices in the near future, ARM poweded devices are at some disadvantage now. Is this something that we should try to change? :-) ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
On Sunday 14 January 2007 20:11, Frantisek Dufka wrote: > Marius Gedminas wrote: > > On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote: > >> On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote: > >>> Also Nokia 770 runs not at 220MHz as stated on your page, but at > >>> something closer to 250MHz as shown by this test code program > >>> (and confirmed to be actually 252MHz by somebody from Nokia > >>> on #maemo about half a year ago). > >> > >> So http://maemo.org/faq/faq.html#faq-N10129 is lying? Well, if I were to create a conspiracy theory, I would suggest that it could be done on purpose to make N800 look like a bigger improvement when comparing it to 770 ;-) But most likely it is just a typo, a lot of new docs became available lately, so they may contain some minor inaccuracies. > > The OMAP1710 page from Texas Instruments also claims 220 MHz is the > > maximum frequency: > > Check /proc/omap_clock on device, it says 252Mhz for both ARM and DSP core. Hmm, interesting. Can anybody check /proc/omap_clock on N800 device? I'm particularly curious about DSP clock frequency (as it can be actually lower than on 770). ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
On Saturday 13 January 2007 21:00, Kalle Vahlman wrote: > We have all sorts of funny hardware at the office, so I thought I'd > make a quick run of cairo-perf with the Cairo 1.3.10 snapshot and see > how they relate to each other. > > There's some funny things I encountered in the results, and I hope > people on both lists can offer insights on why. > > Details at > > http://syslog.movial.fi > > but let's just say that the results were predictable in general, with > some surprises: > > N800 is naturally faster than 770, but I didn't expect the xlib > backend to have so big differences between the two. Maybe these devices were just running different linux kernels (task sheduler may be different) and xservers? So quite a lot of code could be different and these results can't be used to compare these cpus directly. > For the cairo audience there's the question of the tessellation > process, can it really be so fast on the PXA-320 or is there a bug > somewhere that twists the results? What could be so good in PXA-320 > (or not-good on the other devices) that the results are so drastic? What is the amount of cache on all these devices? If PXA-320 has more cache and all the necessary code/data for this test fit it but not on the competing device, that could explain the difference. By the way, here you can take some code for benchmarking cpu clock frequency: https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libavcodec/tests/testfreq.c?root=mplayer&view=markup It performs two test runs, the first run contains a loop with 10 add instructions, the second run just contains the same loop but empty. Substracting time of the second run from the time of the first run we get the time of executing these add instructions only. Number of such instructions executed per second can be used to measure cpu clock frequency. For getting best precision you may want to increase TESTS_COUNT define, it will result in a longer test time though. This test program can show results a bit lower than the actual clock frequency (as we have a multitasking OS and other processes also take some time). But real cpu clock frequency can't be lower than the result benchmarked :) Even for superscalar cpus, these add instructions can't be run in parallel as each new instruction depends on the result of the previous one (hmm, just thought that the last add instruction in a loop can be run in parallel with subs which decreases loop counter, maybe some additional tweak will be required). Also Nokia 770 runs not at 220MHz as stated on your page, but at something closer to 250MHz as shown by this test code program (and confirmed to be actually 252MHz by somebody from Nokia on #maemo about half a year ago). As for optimizing code for ARM (targeting Nokia 770), there are a few things that are slow (maybe this list is still incomplete): 1. Floating point math is slow without vfp (cairo contains a lot of fp math) 2. Integer division is slow ('/' and '% operators) as ARM does not have hardware instruction for it and much less efficient software implementation is used. 3. write access to noncached memory is slow for read-allocate cache on arm926 core (data is not loaded into cache on write), see more details here: http://maemo.org/pipermail/maemo-developers/2006-December/006579.html I have some crude patch for valgrind (callgrind part) to simulate read-allocate cache behaviour (instead of write-allocate as is simulated by default), it can show parts of code which have lots of cache misses. If anybody is interested, I can try to clean it up and submit upstream: http://ufo2000.xcomufo.com/maemo/vg-read-allocate-cache-patch.diff I also had a quick look at cairo sources (without benchmarking it, just to see general coding style). Some parts of code in it are not optimal. For example this code chunk from cairo-path-stroke.c relies on integer division (it is unlikely to cause severe performance decrease here, but may become a real problem for tight loops): [cut] for (i=start; i != stop; i = (i+1) % pen->num_vertices) { tri[2] = f->point; _translate_point (&tri[2], &pen->vertices[i].point); _cairo_traps_tessellate_triangle (stroker->traps, tri); tri[1] = tri[2]; } [/cut] If we go deeper into _cairo_traps_tessellate_triangle, we will notice the following: [cut] memcpy (tsort, t, 3 * sizeof (cairo_point_t)); qsort (tsort, 3, sizeof (cairo_point_t), _compare_point_fixed_by_y); [/cut] There is unnecessary memcpy operation, also qsort is called for just three elements! And such performance bottlenecks are quite easy to spot almost everywhere. Most likely the code that is performance critical, is optimized a lot better, but anyway at least this part deserved a comment such as /* I know that it is slow, but this code is not performance critical and I'm too lazy to optimize it */ :-) Anyway, now I see no surprise that such huge improvements were possible recently
[maemo-developers] N800 Developer Device Program Recommendation
Hello All, After reading quite a number of applications posted and in order to add some diversity here, I decided to actually post a *recommendation* here :) I hope that Mans Rullgard can be considered for inclusion into the list of the developers eligible for getting discount code. He is a maintainer of many parts of ffmpeg and is quite an active contributor (see MAINTAINERS file [1] from ffmpeg distribution and ffmpeg SVN changelog [2], he is mentioned as "mru" there). But more importantly, he has already contributed to maemo by implementing ARMv5TE optimized idct code [3], which improves performance of MPlayer and most likely also built-in player (ffdec_mpegvideo element from Nokia 770 gstreamer stack). The part "Local playback of MPEG video improved" from the second OS2006 release notes [4] probably mentions this particular improvement. I'm just worried that he could get out of the sight of those who are responsible for selection of eligible developers for this program. Having some core ffmpeg developers (or at least one of them) as developer device program participants can have a very positive effect on multimedia support capabilities of Nokia 770/800 or any future maemo devices (both in performance and in the number of properly supported codecs and video formats). And N800 device is interesting for video decoding optimizations as it supports ARMv6 SIMD instructions, so it got some potential here. [1] http://svn.mplayerhq.hu/ffmpeg/trunk/MAINTAINERS?view=markup [2] http://svn.mplayerhq.hu/ffmpeg/trunk/?view=log [3] http://svn.mplayerhq.hu/ffmpeg/trunk/libavcodec/armv4l/simple_idct_armv5te.S [4] http://europe.nokia.com/link?cid=PLAIN_TEXT_48892 ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Xvideo support for Nokia 770?
On Tuesday 09 January 2007 20:59, Charles 'Buck' Krasic wrote: > Any chance the Xvideo support in the Bora 3.0 will turn up in a 770 OS? I asked the same question on #maemo irc channel and daniels explained that video scaling is done by gpu on N800, so probably the same code can't be reused on 770: https://mg.pov.lt/maemo-irclog/%23maemo.2007-01-08.log.html Actually I have been thinking about trying to implement Xvideo support on 770 for some time already. Now as N800 has Xvideo support, it would be nice to have it on 770 as well for better consistency and software compatibility. I see the following possible options: 1. Implement it just using ARM core and optimize it as much as possible (using dynamically generated code for scaling to get the best performance). Is quite a straightforward solution and only needs time to implement it. 2. Try using dsp tasks that already exist on the device and are used for dspfbsink. But the sources of gst plugins contain code that limits video resolution for dspfbsink. I wonder if this check was introduced artificially or it is the limitation of DSP scaler and it can't handle anything larger than that. Also I wonder if existing video scaler DSP task can support direct rendering [2]. It would need to support arbitrary number of memory mapped buffers for video output in order to avoid unnecessary memcpy, otherwise performance will suffer. Maybe we can ask Nokia developers to provide some information about the internals of these plugins. The most important questions are: * What are the real capabilities of DSP based scaler, can it be used for resolutions let's say up to 800x480? * Where is the screen update performed after dsp has finished scaling/converting video from mapped buffer to framebuffer? Is it done on ARM side, or probably screen update can be also triggered from DSP directly? * Is it possible to get direct rendering [2] support with existing dsp tasks on 770? If not, would it be too hard to implement this feature? * How are timestamps handled in dsp? Is it possible to just send a one shot signal to dsp task for rendering video frame from a mapped buffer as fast as possible? A brief dsp interface description would be welcome. Maybe some questions may be trivial, but unfortunately I did not have much time for a detailed walk through the sources in order to figure out how this all works. If any Nokia developer finds time for some short answers, it would really help a lot. 3. Try implementing a new DSP based scaler from scratch. The most important thing to know is how to access framebuffer directly from DSP and move data to it from mapped buffer without any overhead. The first test implementation can just perform nonscaled planar YV12 -> packed YUV422 conversion, if it proves to be fast and useful, it could be extended to also support scaling. PS. This is unrelated to Xvideo support development, but also it would be nice to have more or less detailed description of dsp based gstreamer elements and their properties. While the sources of these plugins are available (with a hidden dsp part), some docs are needed to know how they are supposed to work in order to use them efficiently and probably improve. [1] http://repository.maemo.org/pool/scirocco/free/source/g/gst-plugins-dsp0.10/gst-plugins-dsp0.10_0.32.1-1.tar.gz [2] http://www.mplayerhq.hu/DOCS/tech/dr-methods.txt ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Nokia's Linux-powered N800 Internet Tablet sneaks out early
On Monday 08 January 2007 09:02, Komal Shah wrote: > Ok. Let's fun begin. > > It looks like OMAP2420 only. As per the development track on > linux.omap.com kernel mailing list the product may _NOT_ be using the > IVA1.0 processor. Basically OMAP2420 from the multimedia point of view > is a MPEG4 device. I hope you know that Nokia N93 also uses the > OMAP2420 processor doing cool mpeg4 encode at VGA 30fps, but that's on > IVA1.0 not on ARM. So, if N800 is not using IVA, then don't expect > good multimedia experience like N93. > > BTW, I am slowly and steadily working on the IVA driver components, > but we don't see whole lot of drivers on that right now. Also the following line from dmesg log caugth my eye on #maemo irc channel: [8044.496856] omapfb omapfb: s1d1374x: setting update mode to disabled Does N800 also use Epson video chip just like N770? Could it be be some upgraded version? Something like S1D13745 chip with hardware scaling support: http://www.erd.epson.com/index.php?option=com_docman&task=cat_view&gid=38&Itemid=40 OMAP2420 is supposed to have 2D/3D accelerator on chip. Is it disabled and can't be used at all (using Epson chip instead for better compatibility with existing software developed for N770)? Also I wonder about DSP clock frequency for both N770 and N800. ARM core is clocked at 250MHz in N770 and 330MHZ in N800. TI docs specify 220MHz ARM core and 220MHz DSP core in OMAP1710, while OMAP2420 is specified to have 330MHz for ARM core and 220MHz for DSP. Was Nokia 770 clocked at 250/250 or 250/220? Is the clock frequency for N800 equal to standard 330/220? As ARM core is supposed to have higher clock frequency than DSP and it contains additional useful SIMD instructions, is it a better choice than DSP for multimedia now (pretending that IVA does not exist)? ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] maemo mplayer development and its possible future use on Nokia 770
On Tuesday 12 December 2006 01:57, you wrote: > My original goal of posting the previous message was an attempt to find a > volunteer who would like to try developing such a frontend. > > I don't have that much time to devote to mplayer development myself. Up > until this moment I even could not concentrate on solving some specific > task but tried some bits with MP3 audio output, decoder improvements, GUI > and user interface, fixing arm specific bugs, and now video output code > with hardware YUV support. Also some kind of management work, integration > of useful patches and support for users in the forums takes some time. I > would like to concentrate on some task such as video decoder optimizations > for ARM, but seeing that other parts are not in a quite good shape, > distracts attention somewhat :-) Well, after reading this part again today, looks like it sounds a bit controversial, I'm sorry about it. Actually I'm not the only one who took part in porting and improving mplayer for maemo. Ed Bartosh created deb packages and Josep Torra optimized mpeg decoder for Nokia 770. Also some patches were taken from AGAWA Koji's Zaurus port of mplayer. Not to mention numerous upstream developers of mplayer and ffmpeg who developed this nice piece of software and were very helpful (especially Mans Rullgard who developed initial version of armv5te optimized idct code). Also there were many people who provided useful information and valuable comments. Consider it just as my regret for not being able to contribute to mplayer development as much as I should, but not a rant about 'nobody is helping' :-) I think we still really need a proper credits tab in mplayer GUI. But I expect that having more people contributing to maemo mplayer development can provide some very nice results. So feel free to join. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] maemo mplayer development and its possible future use on Nokia 770
On Monday 11 December 2006 11:26, Frantisek Dufka wrote: > Yes, the result would look like video overlay works in windows or linux > on PC - overlay draws over different windows when it shouldn't :-) We > can live with that. I thought it is actually not a problem but quite a good thing :-) Surely for mplayer as a standalone video player, supporting keboard/initializing some window is important. But if it just outputs video into some rectangular screen area (provided by some other application) and is controlled via issuing commands through a pipe, it makes possible to develop some advanced frontends which use mplayer as a video rendering engine. For example a twin of the standard Nokia 770 video player which simulates all its GUI controls could be created. My original goal of posting the previous message was an attempt to find a volunteer who would like to try developing such a frontend. I don't have that much time to devote to mplayer development myself. Up until this moment I even could not concentrate on solving some specific task but tried some bits with MP3 audio output, decoder improvements, GUI and user interface, fixing arm specific bugs, and now video output code with hardware YUV support. Also some kind of management work, integration of useful patches and support for users in the forums takes some time. I would like to concentrate on some task such as video decoder optimizations for ARM, but seeing that other parts are not in a quite good shape, distracts attention somewhat :-) > As for framebuffer permissions, it may be better to > relax device permissions than to run mplayer as root. The most right way to solve this issue is probably to add 'user' to 'video' group. Alternative solutions involve messing with mplayer binary ownership and suid/sgid bits. I wonder what is possible to do automatically in the least intrusive way when installing mplayer package? > Well, the conversion is done on the fly while the data is transferred to > internal epson video buffer. I guess it would be hard to do planar YUV > -> RGB without additional memory. I still don't understand how it is > done on the fly even in those packed formats since some color > information (U,V) is common for more lines. Seems like tough task. There > needs to be additional memory for remembering U,V parts from previous line. YUV422 is a good format as it matches 16-bit RGB format quite well. Both of them use 16 bits per pixel, and YUV422 encodes each pair of pixels into a stride of 4 bytes (16-bit RGB encodes each pixel into 2 bytes, but you can also treat it as 2 pixels in 4 bytes). So we can mix YUV422 and RGB data in a framebuffer quite conveniently. > > Another interesting possibility is to relay video scaling and color > > conversion (planer -> packed YUV) to DSP. > > I'm not sure, is there some math involved in this or it is just memory > shuffling? I guess DSP would be really bad for memory shuffling. From > previous discussions it looks like when you add DSP to the mix all kinds > of bottlenecks appears. I wonder if gstreamer/dspfbsink could keep up > with mplayer speed doing just conversion and video output. Actually DSP may be a good choice for scaling, if you check the same spru098.pdf you will find "Pixel interpolation Hardware Extension" part :-) Also looks like dspfbsink uses DSP for scaling as it provides a mapped memory for planar YV12 data (or its variant) and accepts a command to do the rendering. I looked through xserver sources and gst plugins to dig for information and I think I got some impression about how they work, but I think this all deserves a separate post along with some additional inquiries addressed to Nokia developers :-) ARM can perform YV12->YUV422 conversion quite fast if properly optimized, I even suspect that it can provide a throughoutput comparable to memcpy (as memory controller/write buffer performance is a limiting factor here and some data shuffling will not make much difference). The benchmarks in my previous message use standard color conversion/scaling code from mplayer which is not optimized for ARM. But just color format conversion is a special case, sometimes scaling is required and mplayer scaler is rather slow. Scaling performed by mplayer was completely unusable for RGB target colorspace with x11 driver, that's why maemo build of mplayer had fallback to SDL when playback for scaled video was required. Now with the target colorspace YUV422, it is slow but still usable and a bit better than SDL. If we want a fast scaler for ARM, using JIT is a good option (and I have some experience in developing JIT translator for x86). Anyway, I hope that by using DSP for scaling and running it asynchronously, it is possible to reduce ARM core usage to almost zero and keep all the resources for video decoding. A related interesting observation is that screen update ioctl does not seem to affect performance at all (commenting it out does not improve performace and naturally w
[maemo-developers] maemo mplayer development and its possible future use on Nokia 770
Hello All, I have just uploaded a new build of mplayer (mplayer_1.0rc1-maemo.3) to garage, which implements some experimental and not yet clean video output method using hardware YUV colorspace and direct access to framebuffer. It's not quite usable as framebuffer access is not allowed when running mplayer as ordinary user (framebuffer device is owned by root:video with 660 permissions). Also right now there are problems with keyboard input and other applications may cause some flicker effect (for example clock applet or google search applet overlap fullscreen video when thay are redrawn). Surely all these problems can be fixed by implementing hybrid x11/framebuffer code where x11 is responsible for keyboard input and sets video mode so that no other application draws over a screen area used by mplayer. But having a plain framebuffer access creates an interesting possibility. Looks like mplayer can coexist with other applicaitons nicely and if they provide some rectangular area for video output, mplayer can be used from them in slave mode (http://www.mplayerhq.hu/DOCS/tech/slave.txt). Mplayer just needs to be extended to accept some command line option or slave option to specify/change screen region that it will use for video playback. Also I noticed that there is some initiative to control video players using d-bus: http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2006-December/048067.html So maybe someone can develop a more advanced gui frontend for mplayer with all the eye candy and convenient gui controls. Let me know if you are interested in this idea and need some help or more information from me. Also thanks to Frantisek Dufka for Epson chip documentation, having full and reliable information adds a certain level of confidence and encourages development (I needed a confirmation that Epson chip supports only packed YUV formats and no planar formats are really available). Next come current benchmark results (mplayer_1.0rc1-maemo.3) showing some recent improvements, you can skip this part if you are not interested. Tested with the following video file (its first 100 seconds): VIDEO: [DIV3] 640x480 24bpp 23.976 fps 779.1 kbps (95.1 kbyte/s) Without hardware YUV colorspace support (-vo x11): # mplayer -endpos 100 -benchmark -vo x11 -nosound -quiet ... SwScaler: using unscaled yuv420p -> bgr565 special converter BENCHMARKs: VC: 122.215s VO: 90.458s A: 0.000s Sys: 1.769s = 214.442s BENCHMARK%: VC: 56.9918% VO: 42.1831% A: 0.% Sys: 0.8250% = 100.% Now using framebuffer output with YUV422 colorspace (-vo nokia770): # mplayer -endpos 100 -benchmark -vo nokia770 -nosound -quiet testfile.avi ... SwScaler: using unscaled yuv420p -> yuyv422 special converter BENCHMARKs: VC: 121.282s VO: 31.538s A: 0.000s Sys: 1.577s = 154.397s BENCHMARK%: VC: 78.5517% VO: 20.4267% A: 0.% Sys: 1.0216% = 100.% VC - is the raw time it took to decode this video fragment VO - is the time it took to display it on screen (performing color conversion) = - is a total time including decoding and displaying (preferably this number should stay below 100 seconds is we want to play this video in realtime) So no playback for 640x480 videos yet, but we got a bit closer to it (and lower bitrate/resolution videos should be supported better with less battery power consumption). Another interesting possibility is to relay video scaling and color conversion (planer -> packed YUV) to DSP. This method is used by dspfbsink gstreamer plugin and another nice feature is that dsp tasks are accessibe for ordinary user. But I'll post some more details and thoughts a bit later in another message. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
[maemo-developers] Optimized memory copying functions for Nokia 770 (final part)
Hello All, Here is an old link with some benchmarks and initial information: http://maemo.org/pipermail/maemo-developers/2006-March/003269.html Now for more completeness, memcpy equivalent is also available and the functions exist in two flavours (either gcc inline macros, or just assembly code), all the sources are here: https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/fastmem-arm9/?root=mplayer The easiest way to try this code is just linking 'fastmem-arm9.S' with your code, it will override glibc 'memcpy' and 'memset' functions with this optimized implementation. But it will probably not affect code that is contained in other shared libararies, for example SDL will still most likely use functions from glibc. If you decide to try using gcc inline macros, it may be not safe, beware of compiler bugs, more details and testcases are here: https://maemo.org/bugzilla/show_bug.cgi?id=733 Anyway, this code may be useful for various games, emulators or any software that may need to clear/initialize or copy large memory blocks fast. So those who are interested, may scavenge something useful there :) At least adding a variation of this this code to allegro game programming library for bitmaps blitting/clearing functions allowed to improve framerate in ufo2000 quite a lot. Sure, that's because of nonoptimal full screen update method which is not very fast and battery friendly anyway and should be changed to screen updates only for the parts of screen that were changed. But sometimes you may have to update full screen anyway, for example when you have it filled with fire and smoke animation. So having fast bitmaps blitting code and being able to just update full screen and have no problems with performance may be a good thing. Technical explanation (at least my understanding of it) is the following. Nokia 770 cpu has some small amount of write back cache, but it is not write allocate. That means if some memory block is already cached, write operation is fast and data is stored immediately to cache. But if some memory block is not cached, it can get to cpu data cache only after read operation, but not write (read allocate cache behaviour). If destination buffer in not in cache, write to it will be performed directly to memory using write buffer. Transfers to memory are performed using blocks of 4, 16, or 32 bytes and these blocks should be aligned. See '5.7 TCM write buffer' and '6.2.2 Transfer size' from http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf So if you write to memory one byte at once, memory bandwidth is wasted (you get only one byte written per memory bus transfer operation, while you could easily get 4 bytes written instead). Here is the worst possible memcpy implementation for example, if you benchmark it, you will get some interesting numbers: void memcpy_trivial(uint8_t *dst, uint8_t *src, int count) { while (--count >= 0) *dst++ = *src++; } But the best performance is achieved when using 16 bytes transfers (aligned at 16 bytes boundary, otherwise it will be just split into some 4 byte transfers). This can't be coded in C, and the use of assembly STM instruction with 4 registers as operands is needed (or any number of registers that is multiple of 4). ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] WMA streaming
On Thursday 23 November 2006 21:02, you wrote: > On Monday 13 November 2006 22:08, Andrew Barr wrote: > > What are my options for Windows Media Audio streams on the 770? Most > > Internet radio streams (I mean simulcasts of real broadcasts in this > > sense) are offered in (at least) this format, which is supported by free > > software (ffmpeg) up to version 2, the latest version I've ever seen > > anywhere. However, no one seems to have had any luck with third-party > > codecs on this device, much less for streaming. Is this device powerful > > enough to handle decoding audio streams using C or ARM-assembly codecs > > (from ffmpeg)? It's likely that the bitstream would be 16 to 24 kbps. I > > understand free tools for the DSP aren't quite there yet, so that may not > > be an option. > > FFmpeg library contains WMA decoder and MPlayer supports it at least on > x86 desktop PC. There are some problems with its support on ARM though. > > First and the most easy to fix is that WMA decoder seems to have some > alignment problems, so it crashes on any attempt to play WMA files. This is > quite easy to workaround by running 'echo 3 > /proc/cpu/alignment' as root, > see https://maemo.org/maemowiki/PortingFromX86ToARM and the links at it > for more information. > > A major problem with WMA decoder is that it uses floating point math. And > having no FPU in hardware, Nokia 770 does not seem to be able to decode > 128kbps WMA files to play them in realtime (sound is skippy). So it is not > very useful unless somebody finds (or implements) a fixed point WMA > decoder. > > However I also tested 20kbps WMA sample and it worked fine with CPU load > at about 60%. So if anybody would like to have such low bitrate WMA > files/streams supported, I can have a look at these alignment problems, fix > them and release updated version of MPlayer for everyone to use. Well, the latest maemo build of MPlayer now supports WMA audio (with that extremely inefficient cpu usage and unability to play high bitrates). But it can play internet radio streams, for example running the following works: # mplayer mms://wm05.nm.cbc.ca/cbcr1-calgary-low On the other hand, standard audio player seems to support WMA files playback just fine :) Does it have problems with internet radio? Or you just tried to look for free alternatives? What was the reason for asking this WMA streaming question in the first place? ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Bluetooth headset?
On Saturday 25 November 2006 19:05, George Farris wrote: > On Sat, 2006-25-11 at 18:34 +0200, Stefan Kost wrote: > > Hi George, > > > > To make it a bit more clear: If there are plans, we would not be allowed > > to talk about it :(. So you'll have to wait. But be assured it if is > > technically doable its quite likly that we are going to support it. Now > > please let it rest. > > Interesting, this suggests you are somehow or other connected with > Nokia, I wasn't aware of that. Yes, I also noticed that only recently as there's Stefan's name in the copyright statements from the now available sources of Nokia 770 gstreamer plugins. Now the previous Stefan's posts look in a bit different light to me and that's a good thing. I guess, we are just spoiled by the fact that many (almost all?) Nokia developers here have '@nokia.com' in their e-mail address :) ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Unresolved issues (Week 46)
On Thursday 23 November 2006 22:19, Charles 'Buck' Krasic wrote: > No, my dsp work was actually video related.I did reuse Siarhei > Siamashka's mplayer code to decode/output mp3 directly, but that > obviously doesn't help with speex. Just a disclaimer before anybody starts bashing my ugly hack that allows to use dspmp3sink for mp3 playback from mplayer :) I know that it is improper use of gstreamer api for audio synchronization, but anything that looked somewhat better (proper buffer timestamps, the use of gstreamer pipeline clock instead of system time, ...) appeared to work even worse in my tests. So the code that is currently used in mplayer is bad, but the other options seemed to be even worse :( Surely it was not the best experience to start getting familiar with gstreamer and I'm surely not going to start looking for the one who is at fault here, there is a high probability that I missed something important and it is me after all ;) Anybody who can fix gstreamer based output module for mplayer is welcome to submit a patch. The only excuse for keeping this bad code in maemo build of mplayer is that it provides some performance improvement and works quite acceptable (audio/video sync is ok) most of the time. Now as the sources of gstreamer plugins are available, they may provide some insights about how to use them better and what could be wrong. > I'd suggest that the most practical approach for now would be to have > an application that uses a speex dsp task to decode speex, and then > takes the output from that speex task and routes it to an existing > gstreamer plugin for pcm output. This may be suboptimal, as the > data will cross the dsp gateway boundary twice more than necessary, > but it still might retain most of the benefit of offloading speex work > to the dsp.I mean were talking something like 64KB/s of extra > copying in the worst case (?), which I don't think will be a very > significant cost even on the 770's OMAP processor. > > The marginal benefit of persuing a zero copy solution (direct from dsp > to sound) just probably isn't work the effort. Documentation for the > software components of the 770 that use the dsp is virtually > non-existent until now. Aside from the mp3 decoder, I think all of > the other stuff has been basically unavailable to developers outside > of those working on Nokia's closed source multimedia applications. > On the bright side, the gstreamer plugins for these various pieces has > been made open source in maemo 2.1. I wouldn't hold my breath on > the dsp side of these plugins ever becoming open source (although I > would wholeheartedly welcome it!). It would be very nice to have the sources (maybe some stripped down version) of C55x stuff that is used by dsppcmsink as a template for implementing thirdparty dsp based decoders. But maybe I'm asking for too much :) ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] WMA streaming
On Monday 13 November 2006 22:08, Andrew Barr wrote: > What are my options for Windows Media Audio streams on the 770? Most > Internet radio streams (I mean simulcasts of real broadcasts in this sense) > are offered in (at least) this format, which is supported by free software > (ffmpeg) up to version 2, the latest version I've ever seen anywhere. > However, no one seems to have had any luck with third-party codecs on this > device, much less for streaming. Is this device powerful enough to handle > decoding audio streams using C or ARM-assembly codecs (from ffmpeg)? It's > likely that the bitstream would be 16 to 24 kbps. I understand free tools > for the DSP aren't quite there yet, so that may not be an option. FFmpeg library contains WMA decoder and MPlayer supports it at least on x86 desktop PC. There are some problems with its support on ARM though. First and the most easy to fix is that WMA decoder seems to have some alignment problems, so it crashes on any attempt to play WMA files. This is quite easy to workaround by running 'echo 3 > /proc/cpu/alignment' as root, see https://maemo.org/maemowiki/PortingFromX86ToARM and the links at it for more information. A major problem with WMA decoder is that it uses floating point math. And having no FPU in hardware, Nokia 770 does not seem to be able to decode 128kbps WMA files to play them in realtime (sound is skippy). So it is not very useful unless somebody finds (or implements) a fixed point WMA decoder. However I also tested 20kbps WMA sample and it worked fine with CPU load at about 60%. So if anybody would like to have such low bitrate WMA files/streams supported, I can have a look at these alignment problems, fix them and release updated version of MPlayer for everyone to use. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Maemo 2.0 device reboot
On Wednesday 18 October 2006 15:58, Amit Kucheria wrote: > On Wed, 2006-10-18 at 13:00 +0300, ext Marius Gedminas wrote: > > On Tue, Oct 17, 2006 at 11:50:03AM +0200, Malix wrote: > > > Hi, after upgrade to maemo 2.0 I have a problem. Some times my 770 > > > reboot. This happen some times when I'm using the browser and every > > > time I try to use Gizmo. For now I never had problem with other > > > programs. You think this is a software problem or hardware? > > > > My 770 reboots once every two--three days. I think it's a software > > problem. I could be wrong. > > As mentioned in the following link, please file bugs against the errant > applications. > http://maemo.org/maemowiki/ReportingRebootIssues?action=show > > I remember in your case it was FBReader - report it to them. Well, user mode applications not running as root are not supposed to crash the whole system and cause it to reboot. Still this instability and reboots issue seems like a problem with some core component of the system or even the kernel. Maybe something like: https://maemo.org/bugzilla/show_bug.cgi?id=677 Could anybody who still has maemo 1.x sdk installed and can recompile memtester for i, check if the problem is reproducible with memtester in IT2005? ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Maemo 2.0 device reboot
On Wednesday 18 October 2006 16:26, Mike Frantzen wrote: > > On Tue, Oct 17, 2006 at 11:50:03AM +0200, Malix wrote: > > > Hi, after upgrade to maemo 2.0 I have a problem. Some times my 770 > > > reboot. This happen some times when I'm using the browser and every > > > time I try to use Gizmo. For now I never had problem with other > > > programs. You think this is a software problem or hardware? > > > > My 770 reboots once every two--three days. I think it's a software > > problem. I could be wrong. > > I think it's a DSP problem when the microphone is enabled. When writing > some audio programs using GStreamer the DSP will eventually start > sending back error codes, stop working, and eventually cause the whole > n770 to reboot. I had to do a lot of trial-and-error playing with how > the GStreamer dsmpcmsrc pipelines are built and re-used to cut down the > frequency of the problems. Never could get it completely reliable > though. Once the DSP is wedged you have to reboot (if it hasn't > spontaneously rebooted already). I can confirm Nokia 770 GStreamer implementation problems. It is used in MPlayer (in not a very clean way, but that's another story) in order to get mp3 audio decoded by DSP and reduce load on ARM core. I also observed system instability supposedly caused by starting/stopping audio playback: http://maemo.org/pipermail/maemo-developers/2006-August/005232.html Also most likely GStreamer is responsible for MPlayer lockups when trying to play relatively long flv files as observed here: http://www.internettablettalk.com/forums/showpost.php?p=22855&postcount=207 I know that IT2006 just switched to a new version of GStremer, and bugs can be expected. So I'm anticipating a bugfix release to check if some of the problems got fixed ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
[maemo-developers] maemo-apps.org application catalog
Hello All, What do you think about current wiki based application catalog? In my opinion, it is quite a large page already, slow to open and hard to navigate. As I see it now, maemo-apps.org looks like it may become an interesting resource for maemo community in the future. But it lacks content, badly. Most of the applications submitted there were added by 'Frank' who is the admin there. Yesterday I edited ApplicationCatalog2006 wiki page and added a notice with a suggestion for application developers to submit their applications to maemo-apps catalog as well, but this addition was reverted. I'm not suggesting to drop current application catalog, it would be just stupid. But trying some alternatives at the same time seems like a good idea to me. Of course maemo-apps.org is a third party maintained resource with no sources available. I would prefer something more open if it was available (or at least maintained by Nokia). Thoughts? ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Is there a convenient way of switching between usb/bluetooth/wifi networking?
On Friday 22 September 2006 11:07, Kalle Vahlman wrote: > 2006/9/22, Siarhei Siamashka <[EMAIL PROTECTED]>: > > Hello, > > > > I wonder if there exists something like an applet for fast switching > > between usb ethernet/mass storage device modes. > > Yes, there is such an applet in the developer rootfs. Not sure where > it's available from though, someone else might know? http://www.maemo.org/platform/docs/howtos/howto_cpu_trans.pdf Yes, it is available even on 'normal' image and can be installed with apt-get: # apt-cache search maemo-dm Thanks a lot for sharing information. Now the only thing left to figure out is how to teach it not to unmount mmc card on plugging usb cable. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
[maemo-developers] Is there a convenient way of switching between usb/bluetooth/wifi networking?
Hello, I wonder if there exists something like an applet for fast switching between usb ethernet/mass storage device modes. Also it could provide some easy gui interface for setting up networking over bluetooth. Yes, I tried all these types of networking, but lately resorted only to wi-fi as it is the easiest one to use. If no such solution exists right now, maybe it is time to develop one? :) ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] RE: defective memory?
On Wednesday 20 September 2006 01:12, Olivier ROLAND wrote: > If your device is broken then mine is also. > I don't think at all that we speak about (small) fraction because > majority of users won't even notice the problem. > My device seem stable until I stressed it. And stressed it is not a > "condition suffisante" to make the problem happen. That's exactly the point. The device is quite usable and most users will not detect any difference on most common operations. It is a very good sign as looks like in order to get rock solid stability, we only need to allocate and lock the problematic memory page early at boot time and do not let any applications use it. > When I have time, I will make extensive test on my device to check > exactly when the problem occur. Please do it, now with the lastest version of the tester and 40MB tested block, the coverage is almost 2/3 of physical memory. If that's a certain location in memory, the chances that it can be easily detected are quite high. Please verify that the offset of faulty address within 1KB page is reported to be always the same between different runs (it is equal to 1a5 for me). I'm trying to find a way to get a full physical address of that page. In my last tests I managed to mmap '/dev/mem' (just using 'read' function segfaults), but did not have enough time to experiment with it much yet. > My doubt about "small fraction" are probably driven by the fact that I > was "hit" by 'white screen of death' 4 weeks after buying the device. > So I guess that during the reparation my 770 was checked (again) by the > conventional Nokia diagnostic. > I conclude that the conventional Nokia diagnostic doesn't detect the > problem. > > To make things clear, I don't want to make negative publicity at all. I > enjoy this device a lot and I've ported Streamtuner on it with lot of > great feedback from users. > > My 2 cents. I don't want to make negative publicity either. My only goal now is to find some reliable technical solution for both diagnostics and workaround of such problems. After all, I have a good motivation for that :) I'm grateful to Nokia as they are also trying to investigate the problem. I'm quite confident that we can come up with some solution, and it will have some positive effect for Nokia 770 community as a result. This is a new device, software and tools for it are still being developed. We are all learning and getting more experience. > PS: I don't know what is "the conventional Nokia diagnostic" but as far > as I know there is always a "conventional XXX diagnostic" in reparation > centers. By the way, when looking for additional information I found some Sharp Zaurus community forum and asked what they use for hardware diagnostics in the hope that I could use the same tools. Somebody replied me that hardware diagnostics tools are built in Zaurus firmware and are accessible from boot menu. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
[maemo-developers] RE: defective memory?
On 9/19/06, Kimmo Hämäläinen <[EMAIL PROTECTED]> wrote: Yes, it would need to be reproducible in several different devices. The guy here that tried to reproduce it currently thinks that Siarhei's unit is broken. Yes, I also think that the probability of my device being broken is quite high. A certain (small) fraction of other Nokia 770 owners are probably having the same problem. Does it make the device completely useless? Of course no, my device works almost fine, it only crashes and reboots sometimes, I also has filesystem corruption several times (now even switched mmc filesystem to ext3, don't know if it would help much though). So the device can be surely used as a book reader, internet browser and serve other tasks. Other (small) fraction of users who got 'white screen of death' were surely less lucky. What can be done about this if the defective memory problem gets confirmed. I see three possible ways: 1. 'Ignorance is a bliss' - just do nothing, those who don't know about the problem will not worry about it :) The device will just crash or reboot occasionally, some more unlucky users having more annoying crashes will complain in the forums providing some bad PR. 2. Distribute some diagnostics software that will help to identify memory problems and repair/replace defective units, that will have some expences, but will improve overall reliability and reduce the number of negative publicity. 3. Add some (un)official support for working around bad memory regions using technology something similar to BadRAM, in this case most of such units will be completely usable. In general, bad memory problem is quite common for x86 pc's, but there is an excellent tool for memory diagnostics - memtest86. It helped me quite a number of times, also I always advice everyone having stability issues to run it first. I don't know how the reliability of memory chips used in embedded devices compares to the reliability of memory from normal desktop computers, but bad memory seems to be one of the most frequently encountered hardware problems. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)
On 9/19/06, Frantisek Dufka <[EMAIL PROTECTED]> wrote: Just few ideas: software - bug is swapping/pagefault code?, bad ram timings?, too high CPU clock? That's an interesting idea, It seems to be worth trying to downclock the device and check if it improves stability. Does anybody know how to do this? hardware - high power requirements - does it happen more when brightness is high or mem tester is run in ssh over wi-fi? I run all my tests from ssh run over wi-fi. Will try some other combinations later. Just tried with no application running, 20MB run fine, 30MB run very slow so it was probably swapping to card a lot. Turned off swap and could go only to 25MB. The test locks memory immediately after allocation (man mlock), so it should not swap pages out of RAM, and that's why it requires to be run as root. As for memory limits, I tried to explain in one of the prevoious posts, initially the tester can't allocate more than ~20MB of memory. But the next time you launch memtester, it can allocate 25MB, so increasing memory allocation size in small steps allows it to allocate up to 40MB in the end with swap turned off! Probably the system sees that more memory is required and begins to stop some of the unneeded services to free memory (that's just only a guess, did not do much experiments here yet). It can't do that fast, so if you request 40MB too early, it will fail. Did you run memtester with my last patch? It contains this gradually increasing allocation size trick automatically, so you don't need to run memtester many times and can specify 40MB at once. Of course you should not run any other application at the same time :) Test went fine, no errors. Done over bluetooth connection with full brightess on, battery almost full. Will try when battery is low (over wi-fi at home). In my tests this error is also not always reproducible. If I could identify physical address of a bad page (the system should have properly working /dev/mem for this), I could collect some statistics. For example I could check if its physical location is always the same and whether supposedly successful tests did actually allocate this part of memory. Surely it would be much better if memtester could access (almost) all the physical memory at once. Otherwise it can't provide reliable and trustworthy results. Probably boot time memtester similar to memtest86 that runs before the system loads can do this work best, but I wonder if it is easy to access framebuffer to print some results from it. One more (weird) idea is to try adding some syscall for allocation of physical memory at any address (moving its original content to some other place if it is occupied), so it would be able to access and test (almost) all the physical memory while running the system at the same time. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)
On Tuesday 19 September 2006 00:03, you wrote: [...] > An interesting observation is that you need to gradually increase the size > of tested memory block. You need to start with testing 20MB first, then you > can try 25MB and so on up to 43MB. If you try to allocate and test a large > block of memory too early, memtester will just get killed. > > As for the failures, only the last two hex digits of faulty address always > contain 'a5' and it is a bit strange. I expected that offset within a page > would remain the same (I changed malloc to mmap in order to always allocate > memory buffer at a page boundary ) and unless pages have size equal to 256 > bytes, it is inconsistent. A small update. As I checked manual [1], a minimal page size for arm926ej-s cpu is in fact 1KB (tiny page). So inconsistency is now resolved. I have patched memtester to gradually allocate memory starting from 20MB to the size specified in a command line, so it is possible to check larger blocks without any extra tricks, you can download this modified memtester here: http://ufo2000.xcomufo.com/files/memtester-n770.tar.gz If you are going to try it (and it may be a really good idea), it should be run as root. The first argument is the size of memory block to be tested (in megabytes), the second optional argument is the number of passes. Here is a result of running it on my device: Nokia770-26:/media/mmc1# ./memtester 40 1 memtester version 4.0.5 (32-bit) Copyright (C) 2005 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xf000 want 40MB (41943040 bytes) got 40MB (41943040 bytes), virtual address=0x40128000, trying mlock ...locked. Loop 1/1: Stuck Address : testing 0FAILURE: possible bad address line at offset 0x009899a5 (page offset 1a5). Skipping to next test... Random Value: FAILURE: 0x3f770c1e != 0x3f77 at offset 0x004899a5 (page offset 1a5). FAILURE: 0xc50dee8d != 0xc50d at offset 0x004899a5 (page offset 1a5). Compare XOR : FAILURE: 0x0e119ff2 != 0x0e10 at offset 0x004899a5 (page offset 1a5). Compare SUB : FAILURE: 0x7d558974 != 0x5ca0 at offset 0x004899a5 (page offset 1a5). Compare MUL : Compare DIV : ok FAILURE: 0x7febf0e8 != 0x7feb at offset 0x004899a5 (page offset 1a5). Compare OR : FAILURE: 0x7b69b068 != 0x7b69 at offset 0x004899a5 (page offset 1a5). Compare AND : Sequential Increment: ok Solid Bits : testing 1FAILURE: 0x != 0x at offset 0x004899a5 (page offset 1a5). Block Sequential: testing 1FAILURE: 0x01010101 != 0x0101 at offset 0x004899a5 (page offset 1a5). Checkerboard: testing 0FAILURE: 0x != 0x at offset 0x004899a5 (page offset 1a5). Bit Spread : testing 0FAILURE: 0xfffa != 0x at offset 0x004899a5 (page offset 1a5). Bit Flip: testing 0FAILURE: 0x0001 != 0x at offset 0x004899a5 (page offset 1a5). Walking Ones: testing 0FAILURE: 0xfffe != 0x at offset 0x004899a5 (page offset 1a5). Walking Zeroes : testing 0FAILURE: 0x0001 != 0x at offset 0x004899a5 (page offset 1a5). So faulty address is always reported to have offset 1a5 within a page on every run. Now the next thing to do is to identify physical address for use with BadRAM kernel patch. > I also wanted to detect physical address of a faulty memory region. I tried > to open '/dev/mem', read it one page at a time and compare its content with > the data from a faulty page. Unfortunately this does not work on Nokia 770 > and segfaults on reading from '/dev/mem'. The same code works fine on > desktop x86 pc and has no problems identifying physical address for any > page. Test programs were always run as root. I would really like to hear something from Nokia regarding this problem. There may be a few other devices with faulty memory considering some browser crash reports, reboots and instability for some people, a possible example can be seen here (though the reporter did not run the memory test as adviced): https://garage.maemo.org/tracker/index.php?func=detail&aid=84&group_id=54&atid=269 That's not a tragedy and software solution can probably resolve this problem. As you know, bad blocks are common for flash and jffs2 file system handles this issue. RAM can be probably treated in a similar way by using something like BadRAM kernel patch [2] [1] http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf [2] http://rick.vanrein.org/linux/badram/ ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)
On Monday 11 September 2006 00:34, Olivier ROLAND wrote: > > After playing with the device for some time, I got the same problem with > > lzma program this evening. And memtester also confirms that the memory is > > really defective :( > > > > # ./memtester 20 > > memtester version 4.0.5 (32-bit) > > Copyright (C) 2005 Charles Cazabon. > > Licensed under the GNU General Public License version 2 (only). > > > > pagesize is 4096 > > pagesizemask is 0xf000 > > want 20MB (20971520 bytes) > > got 20MB (20971520 bytes), trying mlock ...locked. > > Loop 1: > > Stuck Address : testing 0FAILURE: possible bad address line at > > offset 0x0037e9a5. > > Skipping to next test... > > Random Value: FAILURE: 0xdeb98374 != 0xdeb9 at offset > > 0x000fe9a4. > > FAILURE: 0xd04629fc != 0xd046aa88 at offset 0x000fe9a4. > > Compare XOR : FAILURE: 0x50467c54 != 0x5046 at offset > > 0x000fe9a4. > > Compare SUB : FAILURE: 0xb069e1c0 != 0xdc20 at offset > > 0x000fe9a4. > > ... [...] > Hum ... very interesting memtester give non reproductible result on my > device. > and now lzma test failed also ... > Battery is low. We definitively need to investigate this a little more. > The good news is that your device is probably not broken. (or mine is > also ;-) ) > All this should definitively interest Nokia people ... Well, for the last days I tested memory occasionally and observed problem also with a fully recharged battery at least once :( An interesting observation is that you need to gradually increase the size of tested memory block. You need to start with testing 20MB first, then you can try 25MB and so on up to 43MB. If you try to allocate and test a large block of memory too early, memtester will just get killed. As for the failures, only the last two hex digits of faulty address always contain 'a5' and it is a bit strange. I expected that offset within a page would remain the same (I changed malloc to mmap in order to always allocate memory buffer at a page boundary ) and unless pages have size equal to 256 bytes, it is inconsistent. I also wanted to detect physical address of a faulty memory region. I tried to open '/dev/mem', read it one page at a time and compare its content with the data from a faulty page. Unfortunately this does not work on Nokia 770 and segfaults on reading from '/dev/mem'. The same code works fine on desktop x86 pc and has no problems identifying physical address for any page. Test programs were always run as root. Any other ideas? ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] CPU Info
On Monday 11 September 2006 23:51, [EMAIL PROTECTED] wrote: > When executing: cat /proc/cpuinfo I see the following: > Processor : ARM926EJ-Sid(wb) rev 3 (v5l) > BogoMIPS: 125.76 > Features: swp half thumb fastmult edsp java > CPU implementer : 0x41 > CPU architecture: 5TEJ > CPU variant : 0x0 > CPU part: 0x926 > CPU revision: 3 > Cache type : write-back > Cache clean : cp15 c7 ops > Cache lockdown : format C > Cache format: Harvard > I size : 32768 > I assoc : 4 > I line length : 32 > I sets : 256 > D size : 16384 > D assoc : 4 > D line length : 32 > D sets : 128 > > Can anybody explain what the java feature in the Features mean? Probably the explanation is here (on the first page): http://www.arm.com/pdfs/DVI0035B_926_PO.pdf There are lots of other interesting docs at http://www.arm.com by the way. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)
On Sunday 10 September 2006 11:36, Olivier ROLAND wrote: > Your test work fine on my device. > I see that you run it from /media/mmc1so I guess you format your memory > card with ext2. > Mine still vfat so I can't. If you got same error when running from > internal memory then your device is broken. Thanks a lot for finding time and running the test. Today in the morning I could not reproduce this bug. The device battery just was recharged during night. As nothing else was changed (I checked uptime to be sure that it did not reboot or something), I see three possible explanations (may be wrong, I'm not hardware expert): * page with the faulty memory bit was allocated to some other process * cpu or memory chip was just overheated because of heavy use and the bug disappeared as the temperature got back to normal * maybe the bug is somewhat related to low battery charge level, maybe the battery was unable to provide enough voltage or something for reliable operation I did some search and found this utility for testing memory on non-x86 hardware: http://pyropus.ca/software/memtester/ For those who are lazy to compile it, the binary is here: http://ufo2000.xcomufo.com/files/memtester.gz After playing with the device for some time, I got the same problem with lzma program this evening. And memtester also confirms that the memory is really defective :( # ./memtester 20 memtester version 4.0.5 (32-bit) Copyright (C) 2005 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xf000 want 20MB (20971520 bytes) got 20MB (20971520 bytes), trying mlock ...locked. Loop 1: Stuck Address : testing 0FAILURE: possible bad address line at offset 0x0037e9a5. Skipping to next test... Random Value: FAILURE: 0xdeb98374 != 0xdeb9 at offset 0x000fe9a4. FAILURE: 0xd04629fc != 0xd046aa88 at offset 0x000fe9a4. Compare XOR : FAILURE: 0x50467c54 != 0x5046 at offset 0x000fe9a4. Compare SUB : FAILURE: 0xb069e1c0 != 0xdc20 at offset 0x000fe9a4. ... By the way, I have seen some reports about random device reboots, maybe these people also suffer from defective memory problem. So maybe it is a good idea for everyone to test their memory. Though use it at your own risk, I can't be sure that this test program is working correctly and always provides valid results (I only found it today). Well, as now the problem is identified, it is time to think how to solve it. The first task is making a proper memory testing utility. As memtester needs to allocate memory for testing and lots of memory is already taken by IT OS software and libraries, we can only test a small part of memory (only ~1/3 in the test above). Maybe it is possible to patch kernel (or it already provides such functionality) to allocate any physical memory page for us (relocating its data to some other place if it is already occupied by some other process). If it is possible, we would be able to check all the physical memory except for probably the part occupied by the kernel itself. The next task would be to make some way to use BadRAM kernel patch on Nokia 770: http://rick.vanrein.org/linux/badram/ Preferably physical addresses of the defective parts of memory should be stored somewhere so that they survive reflashing (r&d mode and other flags are stored in such a way, right?). If BadRAM patch becomes a part of standard Nokia 770 kernel, it can help to make use of the memory chips that otherwise would have to be replaced. I wonder how much does Nokia 770 memory chip cost? By the way, maybe Nokia already has some utility for hardware diagnistics and it could become available for download? There would be no need to reinvent the wheel in this case. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)
Hello All, I'm sorry for a long chunk of quoted text at the end of this message (it describes the sympthoms of the problem), but looks like I got an almost reliable proof that there is something wrong with the hardware of my device :( I tried to find some software that could be used for benchmarking and LZMA SDK (http://www.7-zip.org/sdk.html) looked like an interesting option for doing it. But when run on Nokia 770, it sometimes works normally and sometimes fails with the following error message: /media/mmc1 $ time ./lzma b -d19 LZMA 4.43 Copyright (c) 1999-2006 Igor Pavlov 2006-06-04 CompressingDecompressing Error: CRC Error Command exited with non-zero status 1 real1m 37.14s user1m 36.10s sys 0m 0.74s As you see, it failed internal test and was unable to decompress data back correctly. LZMA is an advanced compression algorithm and uses quite a lot of memory (it shows ~20MB memory usage in top with '-d19' option). If any of the bits within this memory block has problems, it can affect data integrity and cause incorrect compression or decompression. So probably lzma can be also used as some kind of memory checker. But it may be also some problem in LZMA code and not in my Nokia 770 hardware, so I would like to ask somebody to run the same test and check if the same problem can be reproduced. You can download the sources of LZMA SDK using this link: http://prdownloads.sourceforge.net/sevenzip/lzma443.tar.bz2?download Decompress this archive, change to 'C/7zip/Compress/LZMA_Alone' directory and run 'make -f makefile.gcc' to compile it. Alternatively you can use my compiled binary: http://ufo2000.xcomufo.com/files/lzma.gz Considering that this test program works fine in scratchbox with qemu, LZMA SDK page mentions performance on ARM and the existence of LZMA debian package for ARM, I think that software bug theory is not very relevant, but it still needs to be confirmed. I will wait for feedback in order to confirm if the problem really exists in my hardware. But looks like it is a high probability that I will have to make some kind of more advanced memory checker, try to identify faulty memory physical address and experiment with badram kernel patch. On Wednesday 23 August 2006 23:09, you wrote: > > > Also I noticed that gstreamer is not very reliable, at least when using > > > it from mplayer. It can freeze or reboot the device sometimes. That's > > > not something that should be expected from high level API. If I detect > > > some reliable pattern in reproducing these bugs, I'll report it to > > > bugzilla for sure. But right now just using mplayer and lots of seeking > > > in video can cause these bugs reasonably fast. ... > Earlier I noticed problems with sound output getting blocked that could be > fixed by bult-in audio or video player. When trying to play anything it > first shows error message. After the second attempt either the sound got > fixed or the device rebooted. I suspected that something could get wrong > with dsp and standard audio player is able to reset it. That was observed > when using fdsrc element for feeding data to the decoder in mplayer. On > stopping/resuming playback, probably partial audio frames could be feeded > to mp3 decoder and that might result in its misbehaviour. ... > Now only complete mp3 audio frames can be sent to dspmp3sink. Anyway, first > everything was ok and I even suspected that I will not encounter any > problems at all. But after a few hours I got several reboots. After the > last reboot even wifi started working strange (could not connect using ssh, > it just showed various errors). Turning the device off, waiting for a few > minutes and turning it on again got everything back to normal. Now I > suspect that it could probably be overheating or some other hardware > problem (the device worked with wifi on and heavy cpu usage because of > decoding video for a long time). I'll keep an eye on it and will report > again if the problems keep showing up and if their source becomes more > clear. ... > I tried swap a long time ago on IT2005, that was done in order to make gcc > work on Nokia 770 to try compiling something before I installed > scratchbox :) Anyway, I did not like the stability as gcc started to fail > with internal compiler errors. So I decided not to use swap as long as it is > enough memory for what I need. > > Also there was some swap related report about the problem with mplayer: > http://www.internettablettalk.com/forums/showpost.php?p=20068&postcount=96 > > But maybe I should give swap another try on IT2006 and see if it helps to > improve stability. > > By the way, I already asked this question in the mailing list long time > ago, but are there any tools for hardware diagnostics on Nokia 770? > Something like memtest86 could probably be very useful. > > Though availablility of hardware diagnostics tools could probably result in > more devices getting returned for replacement w
Re: [maemo-developers] problem with dspmp3sink (was: problem with gstreamer and dsppcm)
On Monday 21 August 2006 18:34, Charles 'Buck' Krasic wrote: > Just in case you have not done it already, enabling swap in your device > can help a lot to prevent out-of-memory errors.Maybe this will help > with mplayer/gstreamer stability. > > I personally suspect a design flaw in the current Linux VM subsystem. > I've observed that if an application allocates memory rapidly, the > kernel may fail to reclaim pages quickly enough from the page and buffer > caches (they are only caches after all), so it actually denies the > allocation request. For example, with zero swap, on a machine with > 1G of ram, and >500M of it pseudo-free (used by caches), I've seen > moderate allocations fail--like when starting an application like > firefox.Enabling even a small amount of swap seems to dramatically > change this behaviour. Thanks for the information, this is interesting. I tried swap a long time ago on IT2005, that was done in order to make gcc work on Nokia 770 to try compiling something before I installed scratchbox :) Anyway, I did not like the stability as gcc started to fail with internal compiler errors. So I decided not to use swap as long as it is enough memory for what I need. Also there was some swap related report about the problem with mplayer: http://www.internettablettalk.com/forums/showpost.php?p=20068&postcount=96 But maybe I should give swap another try on IT2006 and see if it helps to improve stability. By the way, I already asked this question in the mailing list long time ago, but are there any tools for hardware diagnostics on Nokia 770? Something like memtest86 could probably be very useful. Though availablility of hardware diagnostics tools could probably result in more devices getting returned for replacement with otherwise undetected problems and have negative impact on Nokia profit (just joking). ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] automatic byte order check
On Monday 21 August 2006 10:45, Detlef Schmicker wrote: > I had a look at the vncviewer and saw, that it is working in the sandbox > in connection with vino (gnome vnc server). On the device the CoRRE > encoding does not work. > > Probably it is a byte order problem. The code has a lot of byte order > (e.g. GUINT16_TO_BE). Is there a way to automaticaly warn critical > points at compilation? Are there any tools? I'm not completely sure if understood your post correctly, but cpu used in Nokia 770 is little endian (the same as x86). So it is unlikely to have byte order or endian problems here. But ARM is alignment sensitive, so you may have problems because of bad alighment, I started making a page on wiki describing this issue (still very incomplete): http://maemo.org/maemowiki/PortingFromX86ToARM I also tried to search for tools that could identify alignment problems automatically, but did not find anything useful. Probably the most easy way to make such tool is to modify valgrind to track alignment for each memory access operation. But don't know, I ended up finding and fixing such problems in my code manually without the help of any tools :) ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] problem with dspmp3sink
On Monday 21 August 2006 10:32, Eero Tamminen wrote: > > Also I noticed that gstreamer is not very reliable, at least when using > > it from mplayer. It can freeze or reboot the device sometimes. That's not > > something that should be expected from high level API. If I detect some > > reliable pattern in reproducing these bugs, I'll report it to bugzilla > > for sure. But right now just using mplayer and lots of seeking in video > > can cause these bugs reasonably fast. > > First I would recommend using just "top" to see whether mplayer > is either: > - Leaking memory > - Otherwise using too much memory > Either by itself or forcing gstreamer to do that. > > If that is the case, the bug is in the mplayer (or gstreamer (plugin)) > and it needs to be fixed. For debugging the leaks, I would recommend > using Valgrind on x86. Thanks for your reply and debugging advices. Mplayer does not seem lo leak any memory as far as I tested it today. Earlier I noticed problems with sound output getting blocked that could be fixed by bult-in audio or video player. When trying to play anything it first shows error message. After the second attempt either the sound got fixed or the device rebooted. I suspected that something could get wrong with dsp and standard audio player is able to reset it. That was observed when using fdsrc element for feeding data to the decoder in mplayer. On stopping/resuming playback, probably partial audio frames could be feeded to mp3 decoder and that might result in its misbehaviour. Experimented with mplayer for a few hours today while preparing the next release, but using fakesrc instead as described here: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/manual/html/section-data-spoof.html Now only complete mp3 audio frames can be sent to dspmp3sink. Anyway, first everything was ok and I even suspected that I will not encounter any problems at all. But after a few hours I got several reboots. After the last reboot even wifi started working strange (could not connect using ssh, it just showed various errors). Turning the device off, waiting for a few minutes and turning it on again got everything back to normal. Now I suspect that it could probably be overheating or some other hardware problem (the device worked with wifi on and heavy cpu usage because of decoding video for a long time). I'll keep an eye on it and will report again if the problems keep showing up and if their source becomes more clear. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] problem with dspmp3sink (was: problem with gstreamer and dsppcm)
On Friday 18 August 2006 22:16, you wrote: > As the gstreamer got some attention in the mailing list, I think it is a > good chance to remind that I'm still having problems with it too: > http://maemo.org/pipermail/maemo-developers/2006-August/005060.html > > I need to know exact audio playing position when using dspmp3sink in order > to properly synchronize video with it in mplayer (for '-ao -gst -ac dspmp3' > options): https://garage.maemo.org/projects/mplayer/ > > So far I did not succeed. Probably I'm missing something trivial and the > help from somebody else or some kind of brainstorming could solve this > problem very fast. Well, that was really something trivial. It was just needed to set 'sync' property to TRUE for dspmp3sink element and gst_element_query position() function starts working! :-) Well, I was almost sure that I tried this 'sync' property at an early stage of experimenting so it took so long to try it again and figure out that it actually works. I wish somebody with more gstreamer knowledge could provide me with some hint, that would save really a lot of my time doing some silly experiments trying to find a reliable way to estimate this sound delay :-) On a positive side, mplayer with a better gstreamer sound output support with proper audio/video sync will be available really soon. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] problem with dspmp3sink (was: problem with gstreamer and dsppcm)
On Friday 18 August 2006 11:21, Zeeshan Ali wrote: > > - The licens of dsppcmsrc/sink is LGPL, can I find the source anywhere > > - instead of raw data, it would be even better to use dspilbc. > >Is there a way to store the captured data to a file, in a way it can > >be played back again (some wrapper)? > >Eeh! just looked into the plugin and really does say LGPL. I am > quite sure, this is a mistake on the developer's behalf since the > plugin is definitly not LGPL (atleast yet) or even under any free/open > license. As the gstreamer got some attention in the mailing list, I think it is a good chance to remind that I'm still having problems with it too: http://maemo.org/pipermail/maemo-developers/2006-August/005060.html I need to know exact audio playing position when using dspmp3sink in order to properly synchronize video with it in mplayer (for '-ao -gst -ac dspmp3' options): https://garage.maemo.org/projects/mplayer/ So far I did not succeed. Probably I'm missing something trivial and the help from somebody else or some kind of brainstorming could solve this problem very fast. Also I noticed that gstreamer is not very reliable, at least when using it from mplayer. It can freeze or reboot the device sometimes. That's not something that should be expected from high level API. If I detect some reliable pattern in reproducing these bugs, I'll report it to bugzilla for sure. But right now just using mplayer and lots of seeking in video can cause these bugs reasonably fast. Absence of the sources for dspmp3sink does not help for sure, it makes you feel helpless with no source to get additional information and no chance to fix the bugs that may reside in these closed source components. Maybe it is possible to have some description of using mp3 decoder dsp task without gstreamer at all? It probably can reduce the number of intermediate layers and improve reliability if the only thing that we need is just decoding of mp3 data using DSP core. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
[maemo-developers] dspmp3sink and gst_element_query_position() function
Hello All, I'm trying to get current playing position from dspmp3sink, but gst_element_query position() function fails. The same code works fine for dsppcmsink. Did anybody encounter such problem? What solution can be used? ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] libxv status?
On Tuesday 25 July 2006 09:01, you wrote: > > http://www.internettablettalk.com/forums/showthread.php?t=2405 > > > > Actually mplayer works surprisingly fast and has performance not much > > inferior to default video player on Nokia 770 that is using DSP. And > > that all is even without hardware colorspace conversion support! > > > > So is it possible to have an accelerated version of libxv on Nokia 770 > > that would support colorspace conversion and scaling (no matter whether > > using video controller capabilities or relay this task to DSP)? So that > > a more universal and well supported ARM core could deal only with video > > stream decoding. > > We don't have any plans to do this for the 770. Thanks for your reply. In order not to take much of your time, just a few more questions which require only yes/no/maybe answers :) Does Nokia 770 hardware really support YUV colorspace? Is it technically possible (I'm not asking whether it is planned by Nokia now) to have some simple API for YUV colorspaces support added probably as part of libxsp? Is Nokia interested in getting any assistance from the community (from me for example) in improving performance and capabilities of the software and libraries preinstalled on the device? I'm interested in good video support and also game development (official support for twice lower resolution in SDL using pixel doubling, support for portrait/landscape screen orientation modes, background music playback utilizing DSP core, virtual keyboard for X11 applications and SDL in particular, etc.). So I don't mind contributing some of my free time to do some work in order to get all this real. Thanks. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers