For 'idle in transaction' issues, you have to fix your code. I faced this issue couple of months back. How good is your exception handling? Are you rollingback/comitting your transactions while exceptions are thrown, during the course of db operations?
Honestly I wouldn't go for these scripts which kill processes. On Thu, Sep 24, 2009 at 6:20 PM, Shiva Raman <raman.shi...@gmail.com> wrote: > Hi > > Today the load observed very high load . I am pasting the top. > > *TOP * > top - 12:45:23 up 79 days, 14:42, 1 user, load average: 45.84, 33.13, > 25.84 > Tasks: 394 total, 48 running, 346 sleeping, 0 stopped, 0 zombie > Cpu(s): 49.2%us, 0.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, > 50.0%st > Mem: 16133676k total, 14870736k used, 1262940k free, 475484k buffers > Swap: 14466492k total, 124k used, 14466368k free, 11423616k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 4152 postgres 17 0 2436m 176m 171m R 16 1.1 0:03.09 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4122 postgres 17 0 2431m 20m 17m R 12 0.1 0:06.38 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4007 postgres 16 0 2434m 80m 75m R 11 0.5 0:26.46 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 3994 postgres 16 0 2432m 134m 132m R 10 0.9 0:43.40 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4166 postgres 16 0 2433m 12m 8896 R 9 0.1 0:02.71 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4110 postgres 15 0 2436m 224m 217m S 8 1.4 0:06.83 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4061 postgres 16 0 2446m 491m 473m R 8 3.1 0:17.32 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4113 postgres 16 0 2432m 68m 65m R 8 0.4 0:11.03 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4071 postgres 16 0 2435m 200m 194m R 7 1.3 0:13.69 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4169 postgres 15 0 2436m 122m 117m R 7 0.8 0:00.93 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4178 postgres 16 0 2432m 77m 75m R 7 0.5 0:00.56 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4108 postgres 16 0 2437m 301m 293m R 6 1.9 0:11.94 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4155 postgres 16 0 2438m 252m 244m S 5 1.6 0:02.80 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4190 postgres 15 0 2432m 10m 8432 R 5 0.1 0:00.71 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 3906 postgres 16 0 2433m 124m 119m R 5 0.8 0:57.28 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 3970 postgres 16 0 2442m 314m 304m R 5 2.0 0:16.43 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4130 postgres 17 0 2433m 76m 72m R 5 0.5 0:03.76 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4179 postgres 16 0 2432m 105m 102m R 5 0.7 0:01.11 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4125 postgres 17 0 2436m 398m 391m R 4 2.5 0:05.62 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4162 postgres 16 0 2432m 125m 122m R 4 0.8 0:01.01 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 217m S 8 1.4 0:06.83 postgres: postgres dbEnterpriser_09_10 > 192.168.10. dbEnterpriser_09_10 192.168.10. > 4061 postgres 16 0 2446m 491m 473m R 8 3.1 0:17.32 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4113 postgres 16 0 2432m 68m 65m R 8 0.4 0:11.03 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4071 postgres 16 0 2435m 200m 194m R 7 1.3 0:13.69 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4169 postgres 15 0 2436m 122m 117m R 7 0.8 0:00.93 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4178 postgres 16 0 2432m 77m 75m R 7 0.5 0:00.56 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4108 postgres 16 0 2437m 301m 293m R 6 1.9 0:11.94 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4155 postgres 16 0 2438m 252m 244m S 5 1.6 0:02.80 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4190 postgres 15 0 2432m 10m 8432 R 5 0.1 0:00.71 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 3906 postgres 16 0 2433m 124m 119m R 5 0.8 0:57.28 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 3970 postgres 16 0 2442m 314m 304m R 5 2.0 0:16.43 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4130 postgres 17 0 2433m 76m 72m R 5 0.5 0:03.76 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4179 postgres 16 0 2432m 105m 102m R 5 0.7 0:01.11 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4125 postgres 17 0 2436m 398m 391m R 4 2.5 0:05.62 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4162 postgres 16 0 2432m 125m 122m R 4 0.8 0:01.01 postgres: > postgres dbEnterpriser_09_10 192.168.10. > 4185 postgres 1 > > *OUTPUT OF IOSTAT 1 5 (is SAN becoming a bottleneck,shows 50% CPU usage?) > * > > clusternode2:~ # iostat 1 5 > Linux 2.6.16.46-0.12-ppc64 (clusternode2) 09/24/2009 _ppc64_ (4 > CPU) > > avg-cpu: %user %nice %system %iowait %steal %idle > 16.00 0.00 0.68 0.61 10.72 71.99 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 1.08 1.96 22.54 13505448 155494808 > sdb 0.00 0.20 0.45 1410179 3099920 > sdc 0.00 0.05 0.01 357404 78840 > scd0 0.00 0.00 0.00 136 0 > sdd 12.20 77.69 343.49 535925176 2369551848 > sde 0.00 0.00 0.00 1120 0 > > avg-cpu: %user %nice %system %iowait %steal %idle > 29.46 0.00 0.25 0.00 7.43 62.87 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 0.00 0.00 0.00 0 0 > sdb 0.00 0.00 0.00 0 0 > sdc 0.00 0.00 0.00 0 0 > scd0 0.00 0.00 0.00 0 0 > sdd 0.00 0.00 0.00 0 0 > sde 0.00 0.00 0.00 0 0 > > avg-cpu: %user %nice %system %iowait %steal %idle > 46.17 0.00 0.99 0.00 38.52 14.32 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 0.00 0.00 0.00 0 0 > sdb 0.00 0.00 0.00 0 0 > sdc 0.00 0.00 0.00 0 0 > scd0 0.00 0.00 0.00 0 0 > sdd 3.96 0.00 118.81 0 120 > sde 0.00 0.00 0.00 0 0 > > avg-cpu: %user %nice %system %iowait %steal %idle > 48.88 0.00 0.99 0.00 49.88 0.25 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 0.00 0.00 0.00 0 0 > sdb 0.00 0.00 0.00 0 0 > sdc 0.00 0.00 0.00 0 0 > scd0 0.00 0.00 0.00 0 0 > sdd 0.00 0.00 0.00 0 0 > sde 0.00 0.00 0.00 0 0 > > avg-cpu: %user %nice %system %iowait %steal %idle > 47.86 0.00 2.14 0.00 50.00 0.00 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 0.00 0.00 0.00 0 0 > sdb 0.00 0.00 0.00 0 0 > sdc 0.00 0.00 0.00 0 0 > scd0 0.00 0.00 0.00 0 0 > sdd 0.00 0.00 0.00 0 0 > sde 0.00 0.00 0.00 0 0 > > > > > > Andy Colson Wrote : , > *Eww. I think that's bad. A connection that has a transaction open will > cause lots of row versions, which use up ram, and make it slower to step > through the table (even with an index). You really need to fix up your code > and make sure you commit transactions. (any statement (select, insert, > update) will start a new transaction that you need to explicitly commit). > > * > With reference to this suggestion by Andy Colson, we checked the > application code and found that onlyINSERT, UPDATE has COMMIT and SELECT > has no commit, We are using a lot of "Ajax Suggest" in the all the forms > accessed for fetching the data using SELECT statements which are not > explicitly commited. We have started updating the code on this. > > Thanks for this suggestion. > > > Again thanks to suggestion of Scott Marlowe in reducing the number of > connections. This was now reducted to 500 . > > > As i mentioned in the mail, i am restarting the database every 30 minutes. > I found a shell script in the wiki which could the idle in transaction pids. > This is the code. The code will kill all old pids in the server. > > This is the script > > /usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | \ > > > /usr/bin/wc -l ` -gt 20 && /usr/bin/pkill -o -f 'idle in transaction' > > and this is the link where the script was provided. > > http://wiki.dspace.org/index.php/Idle_In_Transaction_Problem > > I tried it run it as test in the server, but the script is not executing. > Even i see many of the "Idle in transaction " PIDs are showing R (RUnning > status) , but most of them are showing S(Sleep ) status. Please suggest > anyway i can resolve this idle transaction issue. > > Regards > > Shiva Raman > > >