[ https://issues.apache.org/jira/browse/CASSANDRA-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731797#comment-14731797 ]
Anuj Wadehra commented on CASSANDRA-8907: ----------------------------------------- [~johnny15676] Got your point !!! I think there are 2 scenarios: 1. Suppose I set it to any value say 200ms and my application was comfortable with it. Now, if I upgrade to minor version then I will 'break' my warning system (as you said) as Warnings were undesirable. 2. My nodes are getting down intermittently due to long GC pauses (20+ secs) but my Warning system is comfortable and not reporting any issue. This is a BUG. Now, if I upgrade with a default value of this property set to 20000ms and I start getting these Warnings. I wont call it 'breaking' my warning system as it is a serious issue else my nodes will go DOWN intermittently without raising Warnings. So, I advocate targetting scenario 2, where this property is enabled by default and set to unreasonably high value (20000+ ms) so that I dont break existing warning systems (as I cant guess whether 100ms or 200ms or 1000ms is comfortable for an application) . But, at the same time I raise the warning, when there is a serious chance of node being marked down. Any user upgrading to minor version will have the option to decrease the value based on his application requirements or leave it as it is. Moreover, If you agree with my above mentioned opinion, I would suggest that tpstats should be logged at min(1000ms,gc warn threshold). If user is sensitive to gc pauses, he will reduce the gc warn threshold to a lower value e.g. 100 ms and then he would like to see diagnostic tpstats info every time a gc pause over 100ms occurs. If user doesnt change the HUGE default gc warn limit (20000+), we would stick to existing way i.e. dump tpstats at gc pauses more than 1000ms to avoid breaking existing way of dumping tpstats. Small concern so we can quickly discuss and close it :) > Raise GCInspector alerts to WARN > -------------------------------- > > Key: CASSANDRA-8907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8907 > Project: Cassandra > Issue Type: Improvement > Reporter: Adam Hattrell > Assignee: Amit Singh Chowdhery > Labels: patch > Attachments: cassnadra-8907.patch > > > I'm fairly regularly running into folks wondering why their applications are > reporting down nodes. Yet, they report, when they grepped the logs they have > no WARN or ERRORs listed. > Nine times out of ten, when I look through the logs we see a ton of ParNew or > CMS gc pauses occurring similar to the following: > INFO [ScheduledTasks:1] 2013-03-07 18:44:46,795 GCInspector.java (line 122) > GC for ConcurrentMarkSweep: 1835 ms for 3 collections, 2606015656 used; max > is 10611589120 > INFO [ScheduledTasks:1] 2013-03-07 19:45:08,029 GCInspector.java (line 122) > GC for ParNew: 9866 ms for 8 collections, 2910124308 used; max is 6358564864 > To my mind these should be WARN's as they have the potential to be > significantly impacting the clusters performance as a whole. -- This message was sent by Atlassian JIRA (v6.3.4#6332)