[Fwd: Re: Berkeley DB]
Original Message Subject: Re: Berkeley DB Date: Sat, 09 Jul 2005 19:35:24 -0500 From: David Jensen [EMAIL PROTECTED] To: Bruce Dubbs [EMAIL PROTECTED] References: [EMAIL PROTECTED] Bruce Dubbs wrote: David, Did you add the test instructions to the Berkeley DB section? If so, it must have been while I was on vacation. In any case, I tried to run it last night. The system was a 2GHz P4 with 1G Ram. When I left about 6PM it had been running several hours and supposedly had less than an hour to go. When I checked on it about 2PM today (20 hours later), the system was *very* slow. The screen did say that all 1562 tests were done. I finally was able to get top working and found a load factor of 17! There were four processes that were using 25% of the CPU each -- all named something like tcl8.4. When I killed these processes, I got control of my system back. I also had to kill several other remnant processes. Do you have any idea what was going on? It always *best guesses* 1 hour! 4 processes is correct, *run_parallel 4*. It may have finished in a few more minutes. After it says the tests are done, it scans the output logs and prints the errors and failed or successful. That does take some time. Now first, let me say the instructions before I got involved said: run_parallel run_std. Notice, no 4. this is like make -j It created 1582 processes and filled a 12G partition with test directories! So I limited it to 4. I have a dual 2.8 with hyperthreading on, thus top showed varying low to medium loads on the 4 pipes. Probably we should ditch the run_parallel, just add a note that it could be used. I will run a test overnight without the run_parallel. Then use that SBU. Also, I think it might be better to have the book use instructions something like: tclsh From the tclsh prompt (%), run: source ../test/test.tcl run_parallel 4 run_std exit make realclean cd .. Yes that is better. Also there is a typo in the book. There is cd.. which should be cd .. (note the space). I'll fix that. -- David Jensen -- http://linuxfromscratch.org/mailman/listinfo/blfs-dev FAQ: http://www.linuxfromscratch.org/blfs/faq.html Unsubscribe: See the above information page
[Fwd: Re: Berkeley DB]
Original Message Subject: Re: Berkeley DB Date: Sat, 09 Jul 2005 19:57:21 -0500 From: Bruce Dubbs [EMAIL PROTECTED] To: David Jensen [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] David Jensen wrote: It always *best guesses* 1 hour! 4 processes is correct, *run_parallel 4*. It may have finished in a few more minutes. When I left, I was up to test 1350 or so. It said less than one hour and I waited 20 :( After it says the tests are done, it scans the output logs and prints the errors and failed or successful. That does take some time. Now first, let me say the instructions before I got involved said: run_parallel run_std. Notice, no 4. this is like make -j It created 1582 processes and filled a 12G partition with test directories! So I limited it to 4. That is reasonable. I didn't check disk space, so that may be an issue. Do you know were the dirs were made? Current dir? /tmp ? I have a dual 2.8 with hyperthreading on, thus top showed varying low to medium loads on the 4 pipes. Probably we should ditch the run_parallel, just add a note that it could be used. Not if it fills up 12G! That doesn't seem reasonable to me. I will run a test overnight without the run_parallel. Then use that SBU. OK. I'll try it too. -- Bruce -- http://linuxfromscratch.org/mailman/listinfo/blfs-dev FAQ: http://www.linuxfromscratch.org/blfs/faq.html Unsubscribe: See the above information page
[Fwd: Re: Berkeley DB]
Original Message Subject: Re: Berkeley DB Date: Sat, 09 Jul 2005 20:40:42 -0500 From: Bruce Dubbs [EMAIL PROTECTED] To: David Jensen [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] David Jensen wrote: Bruce Dubbs wrote: Probably we should ditch the run_parallel, just add a note that it could be used. Not if it fills up 12G! That doesn't seem reasonable to me. I mean used with 'number of processors'. I've lost the thread here. I'm not sure what you mean. I will run a test overnight without the run_parallel. Then use that SBU. I started it already. It's less than 10% loaded. I'm guessing 14 hours for me! OK. I've started too. Right now I'm to test 150. top shows: top - 20:35:49 up 4 days, 19:50, 1 user, load average: 2.89, 3.43, 2.59 Tasks: 95 total, 1 running, 94 sleeping, 0 stopped, 0 zombie Cpu(s): 7.8% us, 3.0% sy, 0.0% ni, 54.0% id, 34.6% wa, 0.2% hi, 0.5% si Mem:513880k total, 391984k used, 121896k free,39040k buffers Swap: 2097140k total, 904k used, 2096236k free,78940k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 7508 bdubbs20 0 20788 8140 1808 S 6.3 1.6 0:02.50 tclsh8.4 4587 bdubbs16 0 20088 7144 2020 S 5.3 1.4 0:09.05 tclsh8.4 4590 bdubbs17 0 20088 7144 2020 S 5.3 1.4 0:09.35 tclsh8.4 7874 bdubbs17 0 20792 8124 1792 S 4.6 1.6 0:02.01 tclsh8.4 4591 bdubbs17 0 20088 7144 2020 S 3.0 1.4 0:09.12 tclsh8.4 4594 bdubbs17 0 20088 7140 2020 S 2.3 1.4 0:09.02 tclsh8.4 To run, what I did was: $cat testit EOF #!/usr/bin/tclsh source ../test/test.tcl run_parallel 4 run_std EOF $ chmod +x testit $ time ./testit I'll let you know what happens. -- Bruce -- http://linuxfromscratch.org/mailman/listinfo/blfs-dev FAQ: http://www.linuxfromscratch.org/blfs/faq.html Unsubscribe: See the above information page
[Fwd: Re: Berkeley DB]
Original Message Subject: Re: Berkeley DB Date: Sat, 09 Jul 2005 21:46:46 -0500 From: David Jensen [EMAIL PROTECTED] To: Bruce Dubbs [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Bruce Dubbs wrote: I've lost the thread here. I'm not sure what you mean. you too? top - 21:40:26 up 14:51, 3 users, load average: 1.66, 1.29, 0.66 Tasks: 76 total, 1 running, 75 sleeping, 0 stopped, 0 zombie Cpu0 : 4.3% us, 3.7% sy, 0.0% ni, 45.8% id, 45.8% wa, 0.3% hi, 0.0% si Cpu1 : 2.3% us, 1.3% sy, 0.0% ni, 71.3% id, 25.0% wa, 0.0% hi, 0.0% si Cpu2 : 0.3% us, 0.0% sy, 0.0% ni, 98.7% id, 1.0% wa, 0.0% hi, 0.0% si Cpu3 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 1034612k total, 996624k used,37988k free, 166548k buffers Swap: 987988k total, 732k used, 987256k free, 658904k cached I am running just: 'run_std' It has a completely different output. no guesses, no test numbers. after 2hours; % run_std Test suite run started at: 19:41 07/09/05 Sleepycat Software: Berkeley DB 4.3.28: (April 22, 2005) Running environment tests Running archive tests Running file operations tests Running locking tests Running logging tests Running memory pool tests Running mutex tests Running transaction tests Running deadlock detection tests Running subdatabase tests Running byte-order tests Running recno backing file tests Running DBM interface tests Running NDBM interface tests Running Hsearch interface tests Running secondary index tests -- David Jensen -- http://linuxfromscratch.org/mailman/listinfo/blfs-dev FAQ: http://www.linuxfromscratch.org/blfs/faq.html Unsubscribe: See the above information page
[Fwd: Re: Berkeley DB]
Original Message Subject: Re: Berkeley DB Date: Sun, 10 Jul 2005 11:00:08 -0500 From: Bruce Dubbs [EMAIL PROTECTED] To: David Jensen [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] David Jensen wrote: Bruce Dubbs wrote: $cat testit EOF #!/usr/bin/tclsh source ../test/test.tcl run_parallel 4 run_std EOF $ chmod +x testit $ time ./testit I forgot to time it, however, it printed the start and end times. I have 236 SBU. That seems about right, 80 SBU with run_parallel 4. Now, I have: UNEXPECTED OUTPUT: WARNING: log record type __db_pg_new: not tested UNEXPECTED OUTPUT: WARNING: log record type __db_pg_prepare: not tested Regression Tests Failed Check UNEXPECTED OUTPUT lines. I did not get these two warnings in four runs of run_parallel 4, or one run_parallel 10. I will send a report to Sleepycat. Note: parallel 4 and parallel 10 ran about the same, 80 SBU. I got (using parallel 4): 02:40:58 (00:05:00) processes running: 10286 10287 10289 10292 Starting test 1520 of 1582 parallel items. Rough guess: less than 1 hour left. Starting test 1530 of 1582 parallel items. Rough guess: less than 1 hour left. Starting test 1540 of 1582 parallel items. Rough guess: less than 1 hour left. Starting test 1550 of 1582 parallel items. Rough guess: less than 1 hour left. Starting test 1560 of 1582 parallel items. Rough guess: less than 1 hour left. Starting test 1570 of 1582 parallel items. Rough guess: less than 1 hour left. Starting test 1580 of 1582 parallel items. Rough guess: less than 1 hour left. Process 2: 451 commands executed in 02:59 02:45:58 (00:05:00) processes running: 10286 10289 10292 Process 4: 444 commands executed in 03:02 Process 1: 438 commands executed in 03:02 02:50:58 (00:05:00) processes running: 10289 Process 3: 249 commands executed in 03:07 All processes have exited. Checking output from ALL.OUT.1 ... UNEXPECTED OUTPUT: g.5 ddoyscript.tcl ./TESTDIR.1 2 6 o 5 done. Checking output from ALL.OUT.2 ... done. Checking output from ALL.OUT.3 ... done. Checking output from ALL.OUT.4 ... done. Regression tests failed. Review UNEXPECTED OUTPUT lines above for errors. Complete logs found in ALL.OUT.x files real190m43.728s user138m3.860s sys 18m12.954s That equates to 87.9 SBU on my system. It took about 15 minutes from the last test to finish. In order to automate, I had to change my testit script to: #!/usr/bin/tclsh source ../test/test.tcl run_parallel 4 run_std exit Or else it doesn't exit and the time is invalid. Did you get the statement Regression tests failed. ? Looking in TESTDIR.1 I do have the line: g.5 ddoyscript.tcl ./TESTDIR.1 2 6 o 5 but I have no idea what it means. --Bruce -- http://linuxfromscratch.org/mailman/listinfo/blfs-dev FAQ: http://www.linuxfromscratch.org/blfs/faq.html Unsubscribe: See the above information page
[Fwd: Re: Berkeley DB]
Original Message Subject: Re: Berkeley DB Date: Sat, 09 Jul 2005 22:00:01 -0500 From: David Jensen [EMAIL PROTECTED] To: Bruce Dubbs [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Bruce Dubbs wrote: David Jensen wrote: Bruce Dubbs wrote: Probably we should ditch the run_parallel, just add a note that it could be used. Not if it fills up 12G! That doesn't seem reasonable to me. I mean used with 'number of processors'. I've lost the thread here. I'm not sure what you mean. Oh, I think I see the confusion: run_parallel 4 starts an overseeing process that runs 'run_std' as 4 parallel processes. It is not required, just suggested as a speed-up. I am getting some serious loading now in 'secondary index tests' -- David Jensen -- http://linuxfromscratch.org/mailman/listinfo/blfs-dev FAQ: http://www.linuxfromscratch.org/blfs/faq.html Unsubscribe: See the above information page
[Fwd: Re: Berkeley DB]
Original Message Subject: Re: Berkeley DB Date: Sun, 10 Jul 2005 12:29:11 -0500 From: Bruce Dubbs [EMAIL PROTECTED] To: David Jensen [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] David Jensen wrote: Bruce Dubbs wrote: Did you get the statement Regression tests failed. ? Looking in TESTDIR.1 I do have the line: Yes I did, with the single run, I had no failures on the parallel runs. g.5 ddoyscript.tcl ./TESTDIR.1 2 6 o 5 Mine says: /usr/bin/tclsh8.4 ../dist/../test/wrap.tcl ./TESTDIR/dead007.log.5 ddoyscript.tcl ./TESTDIR 2 6 o 5 Yours had about half of the string clipped from the beginning, so it did not match a known pattern. .*?wrap\.tcl.*| that from test.tcl, line 319. Hmm. It seems the only problem is the loss if that half line in the log. I wonder what could have caused that The only thing I can think of is a bug (race condition?) in a shared library that writes the code. I'm beginning to think they are not staying on top of this test-suite. I already reported one bug, both of us have found another. Maybe recommend not running it? Yes. Perhaps we should write a hint detailing these issues and put a note in the book about this testing pointing to the hint. I'm not sure it really benefits users to run an 80 SBU test that seems to be flakey. BTW, I'm going to post our messages to BLFS-dev. The thread started as a simple question, but has developed to a point where others should be able to see it. -- Bruce -- http://linuxfromscratch.org/mailman/listinfo/blfs-dev FAQ: http://www.linuxfromscratch.org/blfs/faq.html Unsubscribe: See the above information page
[Fwd: Re: Berkeley DB]
Original Message Subject: Re: Berkeley DB Date: Sun, 10 Jul 2005 07:25:32 -0500 From: David Jensen [EMAIL PROTECTED] To: Bruce Dubbs [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Bruce Dubbs wrote: $cat testit EOF #!/usr/bin/tclsh source ../test/test.tcl run_parallel 4 run_std EOF $ chmod +x testit $ time ./testit I forgot to time it, however, it printed the start and end times. I have 236 SBU. That seems about right, 80 SBU with run_parallel 4. Now, I have: UNEXPECTED OUTPUT: WARNING: log record type __db_pg_new: not tested UNEXPECTED OUTPUT: WARNING: log record type __db_pg_prepare: not tested Regression Tests Failed Check UNEXPECTED OUTPUT lines. I did not get these two warnings in four runs of run_parallel 4, or one run_parallel 10. I will send a report to Sleepycat. Note: parallel 4 and parallel 10 ran about the same, 80 SBU. -- David Jensen -- http://linuxfromscratch.org/mailman/listinfo/blfs-dev FAQ: http://www.linuxfromscratch.org/blfs/faq.html Unsubscribe: See the above information page