Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-19 Thread Siarhei Siamashka

On 9/19/06, Frantisek Dufka <[EMAIL PROTECTED]> wrote:


Just few ideas:
software - bug is swapping/pagefault code?, bad ram timings?, too high
CPU clock?


That's an interesting idea, It seems to be worth trying to downclock
the device and check if it improves stability. Does anybody know how
to do this?


hardware - high power requirements - does it happen more when brightness
is high or mem tester is run in ssh over wi-fi?


I run all my tests from ssh run over wi-fi. Will try some other
combinations later.


Just tried with no application running, 20MB run fine, 30MB run very
slow so it was probably swapping to card a lot. Turned off swap and
could go only to 25MB.


The test locks memory immediately after allocation (man mlock), so it
should not swap pages out of RAM, and that's why it requires to be run
as root. As for memory limits, I tried to explain in one of the
prevoious posts, initially the tester can't allocate more than ~20MB
of memory. But the next time you launch memtester, it can allocate
25MB, so increasing memory allocation size in small steps allows it to
allocate up to 40MB in the end with swap turned off! Probably the
system sees that more memory is required and begins to stop some of
the unneeded services to free memory (that's just only a guess,  did
not do much experiments here yet). It can't do that fast, so if you
request 40MB too early, it will fail. Did you run memtester with my
last patch? It contains this gradually increasing allocation size
trick automatically, so you don't need to run memtester many times and
can specify 40MB at once. Of course you should not run any other
application at the same time :)


Test went fine, no errors. Done over bluetooth
connection with full brightess on, battery almost full. Will try when
battery is low (over wi-fi at home).


In my tests this error is also not always reproducible. If I could
identify physical address of a bad page (the system should have
properly working /dev/mem for this), I could collect some statistics.
For example I could check if its physical location is always the same
and whether supposedly successful tests did actually allocate this
part of memory.

Surely it would be much better if memtester could access (almost) all
the physical memory at once. Otherwise it can't provide reliable and
trustworthy results. Probably boot time memtester similar to memtest86
that runs before the system loads can do this work best, but I wonder
if it is easy to access framebuffer to print some results from it. One
more (weird) idea is to try adding some syscall for allocation of
physical memory at any address (moving its original content to some
other place if it is occupied), so it would be able to access and test
(almost) all the physical memory while running the system at the same
time.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-19 Thread Marius Gedminas
On Tue, Sep 19, 2006 at 03:06:47AM +0300, Siarhei Siamashka wrote:
> I would really like to hear something from Nokia regarding this problem. There
> may be a few other devices with faulty memory considering some browser crash
> reports, reboots and instability for some people, a possible example can be
> seen here (though the reporter did not run the memory test as adviced): 
> https://garage.maemo.org/tracker/index.php?func=detail&aid=84&group_id=54&atid=269

I have experienced a nondeterministic segfault of a command-line application 
once:

  http://maemo.org/pipermail/maemo-developers/2006-August/005370.html

Also, maemo_af_desktop crashes every now and then
(/var/lib/dsme/lifeguard-resets tells me it crashed 35 times already),
but this may be the fault of a buggy statusbar applet (I've
osso-statusbar-cpu installed) rather than bad RAM.

I'll try to find some time to run your version of memtester.

Marius Gedminas
-- 
This sentence does in fact not have the property it claims not to have.


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-19 Thread Kimmo Hämäläinen
On Tue, 2006-09-19 at 11:47, ext Frantisek Dufka wrote:
> Kimmo Hämäläinen wrote:
> >  So, it's unclear yet if it is a software or hardware
> > problem.
> 
> Just few ideas:
> software - bug is swapping/pagefault code?, bad ram timings?, too high 
> CPU clock?
> hardware - high power requirements - does it happen more when brightness 
> is high or mem tester is run in ssh over wi-fi?
> 
> Just tried with no application running, 20MB run fine, 30MB run very 
> slow so it was probably swapping to card a lot. Turned off swap and 
> could go only to 25MB. Test went fine, no errors. Done over bluetooth 
> connection with full brightess on, battery almost full. Will try when 
> battery is low (over wi-fi at home).

Yes, it would need to be reproducible in several different devices. The
guy here that tried to reproduce it currently thinks that Siarhei's unit
is broken.

BR; Kimmo

> 
> Frantisek
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-19 Thread Frantisek Dufka

Kimmo Hämäläinen wrote:

 So, it's unclear yet if it is a software or hardware
problem.


Just few ideas:
software - bug is swapping/pagefault code?, bad ram timings?, too high 
CPU clock?
hardware - high power requirements - does it happen more when brightness 
is high or mem tester is run in ssh over wi-fi?


Just tried with no application running, 20MB run fine, 30MB run very 
slow so it was probably swapping to card a lot. Turned off swap and 
could go only to 25MB. Test went fine, no errors. Done over bluetooth 
connection with full brightess on, battery almost full. Will try when 
battery is low (over wi-fi at home).


Frantisek
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-19 Thread Kimmo Hämäläinen
On Tue, 2006-09-19 at 03:06, ext Siarhei Siamashka wrote:
...
> I would really like to hear something from Nokia regarding this problem. There
> may be a few other devices with faulty memory considering some browser crash
> reports, reboots and instability for some people, a possible example can be
> seen here (though the reporter did not run the memory test as adviced): 
> https://garage.maemo.org/tracker/index.php?func=detail&aid=84&group_id=54&atid=269
> 
> That's not a tragedy and software solution can probably resolve this problem. 
> As you know, bad blocks are common for flash and jffs2 file system handles
> this issue. RAM can be probably treated in a similar way by using something
> like BadRAM kernel patch [2]

We have noticed your e-mail and tried to reproduce the corruption, but
still without success. I myself have noticed an apparent JFFS2
corruption once, but that too was not reproducible (and could have been
caused by RAM). So, it's unclear yet if it is a software or hardware
problem.

BR, Kimmo

> 
> [1] http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf
> [2] http://rick.vanrein.org/linux/badram/
> ___
> maemo-developers mailing list
> maemo-developers@maemo.org
> https://maemo.org/mailman/listinfo/maemo-developers
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-18 Thread Siarhei Siamashka
On Tuesday 19 September 2006 00:03, you wrote:

[...]

> An interesting observation is that you need to gradually increase the size
> of tested memory block. You need to start with testing 20MB first, then you
> can try 25MB and so on up to 43MB. If you try to allocate and test a large
> block of  memory too early, memtester will just get killed.
>
> As for the failures, only the last two hex digits of faulty address always
> contain 'a5' and it is a bit strange. I expected that offset within a page
> would remain the same (I changed malloc to mmap in order to always allocate
> memory buffer at a page boundary ) and unless pages have size equal to 256
> bytes, it is inconsistent.

A small update. As I checked manual [1], a minimal page size for arm926ej-s
cpu is in fact 1KB (tiny page). So inconsistency is now resolved.

I have patched memtester to gradually allocate memory starting from 20MB
to the size specified in a command line, so it is possible to check larger
blocks without any extra tricks, you can download this modified memtester
here: http://ufo2000.xcomufo.com/files/memtester-n770.tar.gz

If you are going to try it (and it may be a really good idea), it should be
run as root. The first argument is the size of memory block to be tested (in
megabytes), the second optional argument is the number of passes.

Here is a result of running it on my device:

Nokia770-26:/media/mmc1# ./memtester 40 1
memtester version 4.0.5 (32-bit)
Copyright (C) 2005 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xf000
want 40MB (41943040 bytes)
got  40MB (41943040 bytes), virtual address=0x40128000, trying 
mlock ...locked.
Loop 1/1:
  Stuck Address   : testing   0FAILURE: possible bad address line at 
offset 0x009899a5 (page offset 1a5).
Skipping to next test...
  Random Value: FAILURE: 0x3f770c1e != 0x3f77 at offset 0x004899a5 
(page offset 1a5).
FAILURE: 0xc50dee8d != 0xc50d at offset 0x004899a5 (page offset 1a5).
  Compare XOR : FAILURE: 0x0e119ff2 != 0x0e10 at offset 0x004899a5 
(page offset 1a5).
  Compare SUB : FAILURE: 0x7d558974 != 0x5ca0 at offset 0x004899a5 
(page offset 1a5).
  Compare MUL :   Compare DIV : ok
FAILURE: 0x7febf0e8 != 0x7feb at offset 0x004899a5 (page offset 1a5).
  Compare OR  : FAILURE: 0x7b69b068 != 0x7b69 at offset 0x004899a5 
(page offset 1a5).
  Compare AND :   Sequential Increment: ok
  Solid Bits  : testing   1FAILURE: 0x != 0x at offset 
0x004899a5 (page offset 1a5).
  Block Sequential: testing   1FAILURE: 0x01010101 != 0x0101 at offset 
0x004899a5 (page offset 1a5).
  Checkerboard: testing   0FAILURE: 0x != 0x at offset 
0x004899a5 (page offset 1a5).
  Bit Spread  : testing   0FAILURE: 0xfffa != 0x at offset 
0x004899a5 (page offset 1a5).
  Bit Flip: testing   0FAILURE: 0x0001 != 0x at offset 
0x004899a5 (page offset 1a5).
  Walking Ones: testing   0FAILURE: 0xfffe != 0x at offset 
0x004899a5 (page offset 1a5).
  Walking Zeroes  : testing   0FAILURE: 0x0001 != 0x at offset 
0x004899a5 (page offset 1a5).

So faulty address is always reported to have offset 1a5 within a page on 
every run. Now the next thing to do is to identify physical address for use
with BadRAM kernel patch.

> I also wanted to detect physical address of a faulty memory region. I tried
> to open '/dev/mem', read it one page at a time and compare its content with
> the data from a faulty page. Unfortunately this does not work on Nokia 770
> and segfaults on reading from '/dev/mem'. The same code works fine on
> desktop x86 pc and has no problems identifying physical address for any
> page. Test programs were always run as root.

I would really like to hear something from Nokia regarding this problem. There
may be a few other devices with faulty memory considering some browser crash
reports, reboots and instability for some people, a possible example can be
seen here (though the reporter did not run the memory test as adviced): 
https://garage.maemo.org/tracker/index.php?func=detail&aid=84&group_id=54&atid=269

That's not a tragedy and software solution can probably resolve this problem. 
As you know, bad blocks are common for flash and jffs2 file system handles
this issue. RAM can be probably treated in a similar way by using something
like BadRAM kernel patch [2]

[1] http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf
[2] http://rick.vanrein.org/linux/badram/
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-18 Thread Siarhei Siamashka
On Monday 11 September 2006 00:34, Olivier ROLAND wrote:

> > After playing with the device for some time, I got the same problem with
> > lzma program this evening. And memtester also confirms that the memory is
> > really defective :(
> >
> > # ./memtester 20
> > memtester version 4.0.5 (32-bit)
> > Copyright (C) 2005 Charles Cazabon.
> > Licensed under the GNU General Public License version 2 (only).
> >
> > pagesize is 4096
> > pagesizemask is 0xf000
> > want 20MB (20971520 bytes)
> > got  20MB (20971520 bytes), trying mlock ...locked.
> > Loop 1:
> >   Stuck Address   : testing   0FAILURE: possible bad address line at
> > offset 0x0037e9a5.
> > Skipping to next test...
> >   Random Value: FAILURE: 0xdeb98374 != 0xdeb9 at offset
> > 0x000fe9a4.
> > FAILURE: 0xd04629fc != 0xd046aa88 at offset 0x000fe9a4.
> >   Compare XOR : FAILURE: 0x50467c54 != 0x5046 at offset
> > 0x000fe9a4.
> >   Compare SUB : FAILURE: 0xb069e1c0 != 0xdc20 at offset
> > 0x000fe9a4.
> > ...

[...]

> Hum ... very interesting memtester give non reproductible result on my
> device.
> and now lzma test failed also ...
> Battery is low. We definitively need to investigate this a little more.
> The good news is that your device is probably not broken. (or mine is
> also ;-) )
> All this should definitively interest Nokia people ...

Well, for the last days I tested memory occasionally and observed problem also
with a fully recharged battery at least once :(

An interesting observation is that you need to gradually increase the size of
tested memory block. You need to start with testing 20MB first, then you can
try 25MB and so on up to 43MB. If you try to allocate and test a large block
of  memory too early, memtester will just get killed.

As for the failures, only the last two hex digits of faulty address always
contain 'a5' and it is a bit strange. I expected that offset within a page
would remain the same (I changed malloc to mmap in order to always allocate
memory buffer at a page boundary ) and unless pages have size equal to 256
bytes, it is inconsistent.

I also wanted to detect physical address of a faulty memory region. I tried to
open '/dev/mem', read it one page at a time and compare its content with the
data from a faulty page. Unfortunately this does not work on Nokia 770 and
segfaults on reading from '/dev/mem'. The same code works fine on
desktop x86 pc and has no problems identifying physical address for any 
page. Test programs were always run as root.

Any other ideas?
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-10 Thread Olivier ROLAND




Siarhei Siamashka a écrit :

  On Sunday 10 September 2006 11:36, Olivier ROLAND wrote:

  
  
Your test work fine on my device.
I see that you run it from /media/mmc1so I guess you format your memory
card with ext2.
Mine still vfat so I can't. If you got same error when running from
internal memory then your device is broken.

  
  
Thanks a lot for finding time and running the test.

Today in the morning I could not reproduce this bug. The device battery just
was recharged during night. As nothing else was changed (I checked uptime to
be sure that it did not reboot or something), I see three possible
explanations (may be wrong, I'm not hardware expert):
* page with the faulty memory bit was allocated to some other process
* cpu or memory chip was just overheated because of heavy use and the
bug disappeared as the temperature got back to normal
* maybe the bug is somewhat related to low battery charge level, maybe the
battery was unable to provide enough voltage or something for reliable
operation

I did some search and found this utility for testing memory on non-x86
hardware: http://pyropus.ca/software/memtester/
For those who are lazy to compile it, the binary is here:
http://ufo2000.xcomufo.com/files/memtester.gz

After playing with the device for some time, I got the same problem with lzma
program this evening. And memtester also confirms that the memory 
is really defective :(

# ./memtester 20
memtester version 4.0.5 (32-bit)
Copyright (C) 2005 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xf000
want 20MB (20971520 bytes)
got  20MB (20971520 bytes), trying mlock ...locked.
Loop 1:
  Stuck Address   : testing   0FAILURE: possible bad address line at 
offset 0x0037e9a5.
Skipping to next test...
  Random Value: FAILURE: 0xdeb98374 != 0xdeb9 at offset 
0x000fe9a4.
FAILURE: 0xd04629fc != 0xd046aa88 at offset 0x000fe9a4.
  Compare XOR : FAILURE: 0x50467c54 != 0x5046 at offset 
0x000fe9a4.
  Compare SUB : FAILURE: 0xb069e1c0 != 0xdc20 at offset 
0x000fe9a4.
...

By the way, I have seen some reports about random device reboots, maybe 
these people also suffer from defective memory problem. So maybe it is a 
good idea for everyone to test their memory. Though use it at your own risk, I
can't be sure that this test program is working correctly and always provides
valid results (I only found it today).

Well, as now the problem is identified, it is time to think how to solve it.

The first task is making a proper memory testing utility. As memtester needs
to allocate memory for testing and lots of memory is already taken by IT OS 
software and libraries, we can only test a small part of memory (only ~1/3 in
the test above). Maybe it is possible to patch kernel (or it already provides
such functionality) to allocate any physical memory page for us (relocating
its data to some other place if it is already occupied by some other process).
If it is possible, we would be able to check all the physical memory except
for probably the part occupied by the kernel itself.

The next task would be to make some way to use BadRAM kernel patch on 
Nokia 770: http://rick.vanrein.org/linux/badram/
Preferably physical addresses of the defective parts of memory should be
stored somewhere so that they survive reflashing (r&d mode and other flags 
are stored in such a way, right?). If BadRAM patch becomes a part of 
standard Nokia 770 kernel, it can help to make use of the memory chips that
otherwise would have to be replaced. I wonder how much does Nokia 770 
memory chip cost?

By the way, maybe Nokia already has some utility for hardware diagnistics 
and it could become available for download? There would be no need to 
reinvent the wheel in this case.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers
  

Hum ... very interesting memtester give non reproductible result on my
device.
and now lzma test failed also ...
Battery is low. We definitively need to investigate this a little more.
The good news is that your device is probably not broken. (or mine is
also  ;-) )
All this should definitively interest Nokia people ...




___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-10 Thread Siarhei Siamashka
On Sunday 10 September 2006 11:36, Olivier ROLAND wrote:

> Your test work fine on my device.
> I see that you run it from /media/mmc1so I guess you format your memory
> card with ext2.
> Mine still vfat so I can't. If you got same error when running from
> internal memory then your device is broken.

Thanks a lot for finding time and running the test.

Today in the morning I could not reproduce this bug. The device battery just
was recharged during night. As nothing else was changed (I checked uptime to
be sure that it did not reboot or something), I see three possible
explanations (may be wrong, I'm not hardware expert):
* page with the faulty memory bit was allocated to some other process
* cpu or memory chip was just overheated because of heavy use and the
bug disappeared as the temperature got back to normal
* maybe the bug is somewhat related to low battery charge level, maybe the
battery was unable to provide enough voltage or something for reliable
operation

I did some search and found this utility for testing memory on non-x86
hardware: http://pyropus.ca/software/memtester/
For those who are lazy to compile it, the binary is here:
http://ufo2000.xcomufo.com/files/memtester.gz

After playing with the device for some time, I got the same problem with lzma
program this evening. And memtester also confirms that the memory 
is really defective :(

# ./memtester 20
memtester version 4.0.5 (32-bit)
Copyright (C) 2005 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xf000
want 20MB (20971520 bytes)
got  20MB (20971520 bytes), trying mlock ...locked.
Loop 1:
  Stuck Address   : testing   0FAILURE: possible bad address line at 
offset 0x0037e9a5.
Skipping to next test...
  Random Value: FAILURE: 0xdeb98374 != 0xdeb9 at offset 
0x000fe9a4.
FAILURE: 0xd04629fc != 0xd046aa88 at offset 0x000fe9a4.
  Compare XOR : FAILURE: 0x50467c54 != 0x5046 at offset 
0x000fe9a4.
  Compare SUB : FAILURE: 0xb069e1c0 != 0xdc20 at offset 
0x000fe9a4.
...

By the way, I have seen some reports about random device reboots, maybe 
these people also suffer from defective memory problem. So maybe it is a 
good idea for everyone to test their memory. Though use it at your own risk, I
can't be sure that this test program is working correctly and always provides
valid results (I only found it today).

Well, as now the problem is identified, it is time to think how to solve it.

The first task is making a proper memory testing utility. As memtester needs
to allocate memory for testing and lots of memory is already taken by IT OS 
software and libraries, we can only test a small part of memory (only ~1/3 in
the test above). Maybe it is possible to patch kernel (or it already provides
such functionality) to allocate any physical memory page for us (relocating
its data to some other place if it is already occupied by some other process).
If it is possible, we would be able to check all the physical memory except
for probably the part occupied by the kernel itself.

The next task would be to make some way to use BadRAM kernel patch on 
Nokia 770: http://rick.vanrein.org/linux/badram/
Preferably physical addresses of the defective parts of memory should be
stored somewhere so that they survive reflashing (r&d mode and other flags 
are stored in such a way, right?). If BadRAM patch becomes a part of 
standard Nokia 770 kernel, it can help to make use of the memory chips that
otherwise would have to be replaced. I wonder how much does Nokia 770 
memory chip cost?

By the way, maybe Nokia already has some utility for hardware diagnistics 
and it could become available for download? There would be no need to 
reinvent the wheel in this case.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-10 Thread Olivier ROLAND




Your test work fine on my device.
I see that you run it from /media/mmc1so I guess you format your memory
card with ext2.
Mine still vfat so I can't. If you got same error when running from
internal memory then your device is broken.


Siarhei Siamashka a écrit :

  Hello All,

I'm sorry for a long chunk of quoted text at the end of this message 
(it describes the sympthoms of the problem), but looks like I got an 
almost reliable proof that there is something wrong with the hardware 
of my device :(

I tried to find some software that could be used for benchmarking 
and LZMA SDK (http://www.7-zip.org/sdk.html) looked like an interesting 
option for doing it. But when run on Nokia 770, it sometimes works 
normally and sometimes fails with the following error message:

/media/mmc1 $ time ./lzma b -d19

LZMA 4.43 Copyright (c) 1999-2006 Igor Pavlov  2006-06-04

   CompressingDecompressing


Error: CRC Error
Command exited with non-zero status 1
real1m 37.14s
user1m 36.10s
sys 0m 0.74s


As you see, it failed internal test and was unable to decompress data back
correctly. LZMA is an advanced compression algorithm and uses quite a lot of
memory (it shows ~20MB memory usage in top with '-d19' option). If any of the
bits within this memory block has problems, it can affect data integrity and
cause incorrect compression or decompression. So probably lzma can be also
used as some kind of memory checker.

But it may be also some problem in LZMA code and not in my Nokia 770 hardware,
so I would like to ask somebody to run the same test and check if  the same
problem can be reproduced. You can download the sources of LZMA SDK using this
link: http://prdownloads.sourceforge.net/sevenzip/lzma443.tar.bz2?download
Decompress this archive, change to 'C/7zip/Compress/LZMA_Alone' directory and
run 'make -f makefile.gcc' to compile it. Alternatively you can use my
compiled binary: http://ufo2000.xcomufo.com/files/lzma.gz

Considering that this test program works fine in scratchbox with qemu, LZMA
SDK page mentions performance on ARM and the existence of LZMA debian 
package for  ARM, I think that software bug theory is not very relevant, but
it still needs to be confirmed.

I will wait for feedback in order to confirm if the problem really exists in
my hardware. But looks like it is a high probability that I will have to make
some kind of more advanced memory checker, try to identify faulty memory
physical address and experiment with badram kernel patch.

On Wednesday 23 August 2006 23:09, you wrote:

  
  

  
Also I noticed that gstreamer is not very reliable, at least when using
it from mplayer. It can freeze or reboot the device sometimes. That's
not something that should be expected from high level API. If I detect
some reliable pattern in reproducing these bugs, I'll report it to
bugzilla for sure. But right now just using mplayer and lots of seeking
in video can cause these bugs reasonably fast.

  

  
  
...

  
  
Earlier I noticed problems with sound output getting blocked that could be
fixed by bult-in audio or video player. When trying to play anything it
first shows error message. After the second attempt either the sound got
fixed or the device rebooted. I suspected that something could get wrong
with dsp and standard  audio player is able to reset it. That was observed
when using fdsrc element for feeding data to the decoder in mplayer. On
stopping/resuming playback, probably partial audio frames could be feeded
to mp3 decoder and that might result in its misbehaviour.

  
  
...

  
  
Now only complete mp3 audio frames can be sent to dspmp3sink. Anyway, first
everything was ok and I even suspected that I will not encounter any
problems at all. But after a few hours I got several reboots. After the
last reboot even wifi started working strange (could not connect using ssh,
it just showed various errors). Turning the device off, waiting for a few
minutes and turning it on again got everything back to normal. Now I
suspect that it could probably be overheating or some other hardware
problem (the device worked with wifi on and heavy cpu usage because of
decoding video for a long time). I'll keep an eye on it and will report
again if the problems keep showing up and if their source becomes more
clear.

  
  
...

  
  
I tried swap a long time ago on IT2005, that was done in order to make gcc
work on Nokia 770 to try compiling something before I installed
scratchbox :) Anyway, I did not like the stability as gcc started to fail
with internal compiler errors. So I decided not to use swap as long as it is
enough memory for what I need. 

Also there was some swap related report about the problem with mplayer:
http://www.internettablettalk.com/forums/showpost.php?p=20068&postcount=96

But maybe I should give swap another try on IT2006 and see if it helps to
improve stability.

By the way, I already asked this question in the mail

Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-09 Thread Siarhei Siamashka
Hello All,

I'm sorry for a long chunk of quoted text at the end of this message 
(it describes the sympthoms of the problem), but looks like I got an 
almost reliable proof that there is something wrong with the hardware 
of my device :(

I tried to find some software that could be used for benchmarking 
and LZMA SDK (http://www.7-zip.org/sdk.html) looked like an interesting 
option for doing it. But when run on Nokia 770, it sometimes works 
normally and sometimes fails with the following error message:

/media/mmc1 $ time ./lzma b -d19

LZMA 4.43 Copyright (c) 1999-2006 Igor Pavlov  2006-06-04

   CompressingDecompressing


Error: CRC Error
Command exited with non-zero status 1
real1m 37.14s
user1m 36.10s
sys 0m 0.74s


As you see, it failed internal test and was unable to decompress data back
correctly. LZMA is an advanced compression algorithm and uses quite a lot of
memory (it shows ~20MB memory usage in top with '-d19' option). If any of the
bits within this memory block has problems, it can affect data integrity and
cause incorrect compression or decompression. So probably lzma can be also
used as some kind of memory checker.

But it may be also some problem in LZMA code and not in my Nokia 770 hardware,
so I would like to ask somebody to run the same test and check if  the same
problem can be reproduced. You can download the sources of LZMA SDK using this
link: http://prdownloads.sourceforge.net/sevenzip/lzma443.tar.bz2?download
Decompress this archive, change to 'C/7zip/Compress/LZMA_Alone' directory and
run 'make -f makefile.gcc' to compile it. Alternatively you can use my
compiled binary: http://ufo2000.xcomufo.com/files/lzma.gz

Considering that this test program works fine in scratchbox with qemu, LZMA
SDK page mentions performance on ARM and the existence of LZMA debian 
package for  ARM, I think that software bug theory is not very relevant, but
it still needs to be confirmed.

I will wait for feedback in order to confirm if the problem really exists in
my hardware. But looks like it is a high probability that I will have to make
some kind of more advanced memory checker, try to identify faulty memory
physical address and experiment with badram kernel patch.

On Wednesday 23 August 2006 23:09, you wrote:

> > > Also I noticed that gstreamer is not very reliable, at least when using
> > > it from mplayer. It can freeze or reboot the device sometimes. That's
> > > not something that should be expected from high level API. If I detect
> > > some reliable pattern in reproducing these bugs, I'll report it to
> > > bugzilla for sure. But right now just using mplayer and lots of seeking
> > > in video can cause these bugs reasonably fast.

...

> Earlier I noticed problems with sound output getting blocked that could be
> fixed by bult-in audio or video player. When trying to play anything it
> first shows error message. After the second attempt either the sound got
> fixed or the device rebooted. I suspected that something could get wrong
> with dsp and standard  audio player is able to reset it. That was observed
> when using fdsrc element for feeding data to the decoder in mplayer. On
> stopping/resuming playback, probably partial audio frames could be feeded
> to mp3 decoder and that might result in its misbehaviour.

...

> Now only complete mp3 audio frames can be sent to dspmp3sink. Anyway, first
> everything was ok and I even suspected that I will not encounter any
> problems at all. But after a few hours I got several reboots. After the
> last reboot even wifi started working strange (could not connect using ssh,
> it just showed various errors). Turning the device off, waiting for a few
> minutes and turning it on again got everything back to normal. Now I
> suspect that it could probably be overheating or some other hardware
> problem (the device worked with wifi on and heavy cpu usage because of
> decoding video for a long time). I'll keep an eye on it and will report
> again if the problems keep showing up and if their source becomes more
> clear.

...

> I tried swap a long time ago on IT2005, that was done in order to make gcc
> work on Nokia 770 to try compiling something before I installed
> scratchbox :) Anyway, I did not like the stability as gcc started to fail
> with internal compiler errors. So I decided not to use swap as long as it is
> enough memory for what I need. 
>
> Also there was some swap related report about the problem with mplayer:
> http://www.internettablettalk.com/forums/showpost.php?p=20068&postcount=96
>
> But maybe I should give swap another try on IT2006 and see if it helps to
> improve stability.
>
> By the way, I already asked this question in the mailing list long time
> ago, but are there any tools for hardware diagnostics on Nokia 770?
> Something like memtest86 could probably be very useful.
>
> Though availablility of hardware diagnostics tools could probably result in
> more devices getting returned for replacement w