[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523037#comment-14523037 ] Benedict commented on CASSANDRA-8584: - [~aweisberg]: your patch changes the behaviour of the logging of the commit log sync, so that we drop information. Our NoSpamLogger should perhaps have a ticket system, so that if we get the right to log we know it (and can reset)? Previously every 5m we printed an aggregation of the problematic state and reset our counters; now we print and reset without knowing if the data got printed. So the problem of CL being behind on sync could be much worse than an operator realises. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Ariel Weisberg >Priority: Trivial > Fix For: 2.1.x > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523181#comment-14523181 ] Joshua McKenzie commented on CASSANDRA-8584: It looks like the commit log sync is scope-creep (on a ticket that's already been extended pretty far by scope creep). Maybe work that separately? > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Ariel Weisberg >Priority: Trivial > Fix For: 2.1.x > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531442#comment-14531442 ] Ariel Weisberg commented on CASSANDRA-8584: --- [~JoshuaMcKenzie] If you'll tolerate it I would like to just do both. A dedicated branch and JIRA for the change to ACS seems excessive. [~benedict] I updated it to only reset the stat if the log statement fired. Does that make sense to you or did you mean something more involved when you mentioned ticket system? > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Ariel Weisberg >Priority: Trivial > Fix For: 2.1.x > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531825#comment-14531825 ] Joshua McKenzie commented on CASSANDRA-8584: I'll tolerate it. Just update the ticket to reflect the increased scope - it's your baby now. ;) > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Ariel Weisberg >Priority: Trivial > Fix For: 2.1.x > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532335#comment-14532335 ] Benedict commented on CASSANDRA-8584: - That looks fine. I have one more suggestion, though (not an important one): It would _sometimes_ be helpful for the NoSpamLogger to accept a second guard parameter. For instance, the original goal of this ticket (failed skip cache) could have different error reasons. It might be helpful to guard them independently, to avoid suppressing useful information. Not exactly required, given the non-essential nature of the call, but it could be helpful information for the operator we're suppressing. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Ariel Weisberg >Priority: Trivial > Fix For: 2.1.x > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347101#comment-14347101 ] Benedict commented on CASSANDRA-8584: - +1 I have a slight concern we might spam the log file if there's something systematically wrong, but I also don't want to pollute our code too much. I wonder if we should create a static utility class for logging messages that we don't want to be spammy; we already have at least one place we impose a no-spam rule, and I'm sure we have others we should be and will. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347127#comment-14347127 ] Benedict commented on CASSANDRA-8584: - bq. Providing a stack or filename (or both) with the error SGTM bq. What's the other context in which we impose no-spam, for reference? https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/db/commitlog/AbstractCommitLogService.java#L103 Perhaps we should have a NoSpamLogger utility class that wraps a Logger and a minimum time interval, exposes just the basic info/warn/error\(String s, Object... objects\) methods, and drops any messages that arrive within time interval of the last logged message...? > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347117#comment-14347117 ] Joshua McKenzie commented on CASSANDRA-8584: Thinking back to when I added this, I think we could use some more color surrounding the error; the strerror output will tell us "invalid file handle" or some such without specifying the actual file it failed on. Providing a stack or filename (or both) with the error + throttling the # of messages would make this a more robust addition I think. What's the other context in which we impose no-spam, for reference? > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347162#comment-14347162 ] Jonathan Ellis commented on CASSANDRA-8584: --- Doesn't logback build this in somewhere? /cc [~dbrosius] > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348153#comment-14348153 ] Dave Brosius commented on CASSANDRA-8584: - logback has DuplicateMessageFilter http://logback.qos.ch/manual/filters.html#DuplicateMessageFilter but these are exact matches, not just base pattern matches. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355503#comment-14355503 ] Joshua McKenzie commented on CASSANDRA-8584: [Branch here|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:8584] * Integrated attached logger w/slight rename * Changed trySkipCache to take path and print on error * Trivial rename in SSTableReWriter for clarity on which fileDescriptors we were storing I can't seem to reproduce the original failure on trySkipCache that led to this ticket. Good to have this around for future changes though. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355520#comment-14355520 ] Benedict commented on CASSANDRA-8584: - Can we simply pass a String path to trySkipCache, and just use reader.getFilename()? It seems to be what we use elsewhere for logging, and it is perhaps a little neater. It's probably worth porting the log message to use the varargs parameter of the warn() method. If we're renaming the descriptors, perhaps call them sourceFileDescriptors? or sourceDataComponentFileDescriptors? Or something along those lines... Seems that the fact they're source, not target, files is valuable information for a future reader. Otherwise LGTM +1 > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355606#comment-14355606 ] Joshua McKenzie commented on CASSANDRA-8584: bq. Can we simply pass a String path to trySkipCache... That works. CommitLogSegment does't have getFilename but we can reconstruct from logFile easily enough and that's much cleaner. bq. If we're renaming the descriptors... Renamed to sourceDataDescriptors. I think File is implicit in the context. bq. It's probably worth porting the log message to use the varargs parameter of the warn() method. Not exactly sure what you mean as ThrottledLogger.warn() was using varargs, and why only warn()... But if you meant to include Objects[] in the hash for throttling in the log *method*, I added that and added a unit test to sanity check as well. Branch updated. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355615#comment-14355615 ] Benedict commented on CASSANDRA-8584: - bq. But if you meant to include Objects[] in the hash for throttling in the log method, I added that and added a unit test to sanity check as well. I think that would actually obviate much of the throttling benefit :) I meant simply to avoid string concatenation, and use \{\} logging replacement syntax > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355691#comment-14355691 ] Joshua McKenzie commented on CASSANDRA-8584: bq. I think that would actually obviate much of the throttling benefit You know, I think you're right. I was thinking in terms of throttling individual errors rather than *classes* of errors but that level of specificity will just put us largely back where we started. Reverted. bq. I meant simply to avoid string concatenation, and use {} logging replacement syntax It took me *far* longer than it should have to realize you were referring to the log message in CLibrary.java. I was trying to figure out how our call on wrapped.warn(s, objects) in ThrottledLogger was somehow not using the varargs signature... /sigh. That one's on me - apparently today is my day to be obtuse. Hashing changes simplified, unit test pared down, log message avoiding concatenation. Branch updated. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366954#comment-14366954 ] Benedict commented on CASSANDRA-8584: - Sorry for the delay, missed this in my work queue. LGTM. One nit to consider on commit is if we should increase the window for throttling - an error doing something like this probably doesn't need to be reported secondly, and probably not even minutely. Probably 10m+ is more like it IMO. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378061#comment-14378061 ] Benedict commented on CASSANDRA-8584: - The reason I didn't go all out is I actually think it's detrimental to optimise something like this: it's intended to never run under normal operation; if it does run, it will almost never be contended (we're trying to not-spam by secondly intervals, not nanosecondly); and the logging is itself inherently blocking (as stands), and since that is hit on every log command, not just the no-spam paths, if any contention will occur, it will likely occur there. So, if we care at all about being non-blocking here, we should start by making our logging non-blocking, otherwise this is just for warm fuzzy feels. I don't mind terribly, but I generally am opposed to even minor changes that only _appear_ to deliver a benefit, since it sends potentially erroneous signals to readers of the code that this matters here. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378511#comment-14378511 ] Ariel Weisberg commented on CASSANDRA-8584: --- I think there are use cases for rate limited logging that are wider. It's usually canary log statements where something undesirable is happening hundreds to thousands of times a second and you generally expect that not be the happening. You may only want to log once an hour or day to remind the operator this is happening. I don't mind if it doesn't go in now. Better to make the case when there is a concrete problem to solve. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378574#comment-14378574 ] Benedict commented on CASSANDRA-8584: - I'll note it's a preference, and not a terribly strong one, so if you _do_ want to put it in I won't -1. But I would rather give some thought to logging as a whole if we care about this kind of issue. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378624#comment-14378624 ] Jeremy Hanna commented on CASSANDRA-8584: - So pre-logback, all of this would go into the output.log because that's where stderr goes. I presume it's the same for logback. The output.log never rolls on a running server. Not sure if that affects how we handle this, but just wanted to make sure it wouldn't fill up the volume eventually. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378678#comment-14378678 ] Robert Stupp commented on CASSANDRA-8584: - I'm pro and contra regarding log-throttling. Pro, because it may limit the amount of "spam" in the log file (e.g. hundreds of "batch too big" messages) Contra, because it may hide the importance of a message (amount of messages _is_ is the level of importance) OTOH - who really *reads* log files (especially in a big cluster)? What I want to say is: is a log _file_ really the place where _important_ should go to? I think there are some solutions out there that do some log file scanning and aggregation and alerting. LD;DR IMO we should stick with system.log but think of something or use/add support for something that really adds value for operators of both small and big clusters. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378728#comment-14378728 ] Ariel Weisberg commented on CASSANDRA-8584: --- Throttling isn't about reducing the amount of information in the log. It's about making it possible to include information that you woudln't be able to without some kind of coordination that ensures you don't flood the log. This isn't a proposal to throttle all messages by default. It's opt-in for special cases where you know it could be disruptive. We should really be talking about this in CASSANDRA-9029 > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378733#comment-14378733 ] Benedict commented on CASSANDRA-8584: - bq. who really reads log files It seems like this conversation is heading very much towards that general analysis of logging, which is probably best to happen on its own ticket. At the very least on CASSANDRA-9029, which will be more clearly related to any general topic. I agree that logging is not the best way to manage bad cluster states, since most people don't monitor their logs. Unfortunately we have no better (or any other) way of alerting users to significant problems. This is perhaps a third line of enquiry to open: is there a better way for us to report significant events that cluster owners should respond to? I'm not aware of a good standardised API for this. bq. amount of messages is is the level of importance If we choose to expand the NoSpamLogger's functionality, I also agree it makes sense for it to support aggregation of log messages over some time horizon, so if any log messages are suppressed, a tally count of those suppressed is periodically emitted. But I don't consider this super pressing. Seeing these messages every minute in the log is notification enough that the problem is prevalent. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378738#comment-14378738 ] Robert Stupp commented on CASSANDRA-8584: - Maybe I've expressed myself a bit unclear: Some people tend to think that a message which appears very often must be very important. It's more a psychological than a technical thing. (Heading over to CASSANDRA-9029) > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378739#comment-14378739 ] Jeremy Hanna commented on CASSANDRA-8584: - FWIW, many users with small to moderately sized clusters read the log files (say up to 100 nodes). Beyond that I imagine many users are taking advantage of tools like logstash/kibana and splunk. Also the output.log isn't as commonly viewed as people ime assume that everything is in the system.log. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Joshua McKenzie >Priority: Trivial > Fix For: 2.1.4 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394779#comment-14394779 ] Joshua McKenzie commented on CASSANDRA-8584: Removed NoSpamLogger in deference to CASSANDRA-9029. A quick run against 2.1-HEAD w/this patch gives: {noformat} grep trySkipCache 8584_utest.txt | wc -l 432 {noformat} I'll either wait until 9029's in or track down the source of the failing trySkipCache calls and create another ticket for that. I'd prefer to have a clean slate w/regards to our page-cache prompting before committing this. > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Ariel Weisberg >Priority: Trivial > Fix For: 2.1.5 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8584) Add strerror output on failed trySkipCache calls
[ https://issues.apache.org/jira/browse/CASSANDRA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493197#comment-14493197 ] Ariel Weisberg commented on CASSANDRA-8584: --- Created a branch for this using NoSpamLogger https://github.com/apache/cassandra/compare/trunk...aweisberg:C-8584?expand=1 > Add strerror output on failed trySkipCache calls > > > Key: CASSANDRA-8584 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8584 > Project: Cassandra > Issue Type: Improvement >Reporter: Joshua McKenzie >Assignee: Ariel Weisberg >Priority: Trivial > Fix For: 2.1.5 > > Attachments: 8584_v1.txt, NoSpamLogger.java, nospamlogger.txt > > > Since trySkipCache returns an errno rather than -1 and setting errno like our > other CLibrary calls, it's thread-safe and we could print out more helpful > information if we failed to prompt the kernel to skip the page cache. That > system call should always succeed unless we have an invalid fd as it's free > to ignore us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)