You can use jstat -gcutil <pid> 2000
to print the GC statistics every 2 seconds, http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jstat.html It the long pause is from GC, the columns FGCT/FGC values would be large. If you think it's a swap issue, you may want to use vmstat 1 10000 watch out the si/so columns. What's your jvm arguments? Too small heap memory size may be the issue. 2014-09-24 9:47 GMT+08:00 larry google groups <lawrencecloj...@gmail.com>: > I'm guessing that strace is showing me userland threads? When I quit > strace I see: > > ^CProcess 19363 detached > Process 19364 detached > Process 19365 detached > Process 19366 detached > Process 19367 detached > Process 19368 detached > Process 19369 detached > Process 19370 detached > Process 19371 detached > Process 19372 detached > Process 19377 detached > Process 19378 detached > Process 19379 detached > Process 19380 detached > Process 19381 detached > Process 19382 detached > Process 19383 detached > Process 19384 detached > Process 19385 detached > Process 19386 detached > Process 19387 detached > Process 19388 detached > Process 19389 detached > Process 19390 detached > Process 19391 detached > Process 19392 detached > Process 19393 detached > Process 19394 detached > Process 19395 detached > Process 19396 detached > Process 19397 detached > Process 19398 detached > Process 19399 detached > Process 19400 detached > Process 19401 detached > Process 19402 detached > Process 19403 detached > Process 19404 detached > Process 19405 detached > Process 19406 detached > Process 19407 detached > Process 19408 detached > Process 19409 detached > Process 19410 detached > Process 19606 detached > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 90.06 40.973072 1363 30059 10449 futex > 4.23 1.926411 819 2353 epoll_wait > 3.02 1.373282 114440 12 6 restart_syscall > 1.24 0.563107 93851 6 accept > 0.99 0.449988 12 36909 gettimeofday > 0.35 0.156992 5 29410 clock_gettime > 0.05 0.021064 67 316 recvfrom > 0.02 0.010338 30 347 write > 0.01 0.005117 24 209 sendto > 0.01 0.004369 24 180 poll > 0.01 0.002683 24 112 22 read > 0.01 0.002563 24 108 6 epoll_ctl > 0.00 0.001618 14 112 open > 0.00 0.001189 5 230 fcntl > 0.00 0.001132 8 142 mprotect > 0.00 0.000969 8 118 close > 0.00 0.000806 38 21 writev > 0.00 0.000685 6 109 ioctl > 0.00 0.000655 6 110 fstat > 0.00 0.000229 13 17 mmap > 0.00 0.000216 36 6 shutdown > 0.00 0.000197 33 6 dup2 > 0.00 0.000092 46 2 madvise > 0.00 0.000061 5 12 setsockopt > 0.00 0.000057 14 4 munmap > 0.00 0.000056 5 12 getsockname > 0.00 0.000035 4 8 rt_sigprocmask > 0.00 0.000018 5 4 sched_getaffinity > 0.00 0.000010 5 2 clone > 0.00 0.000009 9 1 rt_sigreturn > 0.00 0.000009 5 2 uname > 0.00 0.000009 5 2 set_robust_list > 0.00 0.000008 4 2 gettid > ------ ----------- ----------- --------- --------- ---------------- > 100.00 45.497046 100943 10483 total > > > > > > > On Tuesday, September 23, 2014 9:44:52 PM UTC-4, larry google groups wrote: >> >> I am intrigued by this article, as the problem sounds the same as mine: >> >> http://corner.squareup.com/2014/09/logging-can-be-tricky.html >> >> "No significant amount of resources appeared to be in use — disk I/O, >> network I/O, CPU, and memory all looked fairly tame. Furthermore, the bulk >> of queries being served were all performing as expected. " >> >> So I tried to follow their example regarding strace. But I have never >> worked with strace before. I used grep to find the PID and then I: >> >> sudo strace -c -f -p 19363 >> >> and I got: >> >> Process 19363 attached with 45 threads >> >> Then I ran our "health check" which is like a series of functional tests >> that ping our actual app (a live environment rather than a test >> environment). I got nothing out of strace except these 2 lines appeared: >> >> Process 20973 attached >> Process 20974 attached >> >> What does this mean? I had the impression that the JVM ran in 1 process? >> Does strace show me userland threads (like htop does) or are these child >> processes? >> >> >> >> >> >> On Monday, September 15, 2014 12:15:14 AM UTC-4, larry google groups >> wrote: >>> >>> >>> I have an embarrassing problem. I convinced my boss that I could use >>> Clojure to build a RESTful API. I was successful in so far as that went, >>> but now I face the issue that every once in a while, the program pauses, >>> for a painfully long time -- sometimes 30 seconds, which causes some >>> requests to the API to timeout. We are still in testing, so there is no >>> real load on the app, just the frontenders, writing Javascript and making >>> Ajax calls to the service. >>> >>> The service seems like a basic Clojure web app. I use Jetty as the >>> webserver, and the libraries in use are: >>> >>> Ring >>> >>> Compojure >>> >>> Liberator >>> >>> Monger >>> >>> Timbre >>> >>> Lamina >>> >>> Dire >>> >>> When someone complains about the pauses, I will go test the service, and >>> I can hit with 40 requests in 10 seconds and it has great performance. The >>> pauses actually seem to come after periods of inactivity, which made me >>> think that this had something to do with garbage collection, except that >>> the pauses are so extreme -- like I said, sometimes as much as 30 seconds, >>> causing requests to timeout. When the app does finally start to respond it >>> again, it goes very fast, and responds to those pending request very fast. >>> >>> But I have to find a way to fix these pauses. >>> >>> Right now I packaged the app as an Uberjar and put it on the server, >>> spun it up on port 24000 and proxied it through Apache. I put a script in >>> /etc/init.d to start the app using start-stop-daemon. >>> >>> Possible things that could be going wrong: >>> >>> Maybe Jetty needs more threads, or maybe less threads? How would I test >>> that? >>> >>> Maybe the link to MongoDB sometimes dies? (Mongo is on another server at >>> Amazon) How would I test that? >>> >>> Maybe it is garbage collection? How would I test that? >>> >>> Maybe I have some code that somehow blocks the whole app? Seems unlikely >>> but I'm trying to keep an open mind. >>> >>> Maybe the thread pool managed by Lamina sometimes gets overwhelmed? How >>> would I test that? >>> >>> Maybe when Timbre writes to the log file it causes things to pause? (But >>> I believe Timbre does this in its own thread?) How do I test that? >>> >>> This is a small app: only about 1,100 lines of code. >>> >>> I don't have much experience debugging problems on the JVM, so I welcome >>> any suggestions. >>> >>> >>> >>> >>> >>> >>> >>> -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- 庄晓丹 Email: killme2...@gmail.com xzhu...@avos.com Site: http://fnil.net Twitter: @killme2008 -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.