[ 
https://issues.apache.org/jira/browse/CASSANDRA-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731797#comment-14731797
 ] 

Anuj Wadehra commented on CASSANDRA-8907:
-----------------------------------------

[~johnny15676] Got your point !!! I think there are 2 scenarios: 

1. Suppose I set it to any value say 200ms and my application was comfortable 
with it. Now, if I upgrade to minor version then I will 'break' my warning 
system (as you said) as Warnings were undesirable.

2. My nodes are getting down intermittently due to long GC pauses (20+ secs) 
but my Warning system is comfortable and not  reporting any issue. This is a 
BUG. Now, if I upgrade with a default value of this property set to 20000ms and 
I start getting these Warnings. I wont call it 'breaking' my warning system as 
it is a serious issue else my nodes will go DOWN intermittently without raising 
Warnings.

So, I advocate targetting scenario 2, where this property is enabled by default 
and set to unreasonably high value (20000+ ms) so that I dont break existing 
warning systems (as I cant guess whether 100ms or 200ms or 1000ms is 
comfortable for an application) . But, at the same time I raise the warning, 
when there is a serious chance of node being marked down.

Any user upgrading to minor version will have the option to decrease the value 
based on his application requirements or leave it as it is. 

Moreover, If you agree with my above mentioned opinion, I would suggest that 
tpstats should be logged at  min(1000ms,gc warn threshold). If user is 
sensitive to gc pauses, he will reduce the gc warn threshold to a lower value 
e.g. 100 ms and then he would like to see diagnostic tpstats info every time a 
gc pause over 100ms occurs. If user doesnt change the HUGE default gc warn 
limit (20000+), we would stick to existing way i.e. dump tpstats at gc pauses 
more than 1000ms to avoid breaking existing way of dumping tpstats.   

Small concern so we can quickly discuss and close it :)

> Raise GCInspector alerts to WARN
> --------------------------------
>
>                 Key: CASSANDRA-8907
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8907
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Adam Hattrell
>            Assignee: Amit Singh Chowdhery
>              Labels: patch
>         Attachments: cassnadra-8907.patch
>
>
> I'm fairly regularly running into folks wondering why their applications are 
> reporting down nodes.  Yet, they report, when they grepped the logs they have 
> no WARN or ERRORs listed.
> Nine times out of ten, when I look through the logs we see a ton of ParNew or 
> CMS gc pauses occurring similar to the following:
> INFO [ScheduledTasks:1] 2013-03-07 18:44:46,795 GCInspector.java (line 122) 
> GC for ConcurrentMarkSweep: 1835 ms for 3 collections, 2606015656 used; max 
> is 10611589120
> INFO [ScheduledTasks:1] 2013-03-07 19:45:08,029 GCInspector.java (line 122) 
> GC for ParNew: 9866 ms for 8 collections, 2910124308 used; max is 6358564864
> To my mind these should be WARN's as they have the potential to be 
> significantly impacting the clusters performance as a whole.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to