[jira] [Created] (CASSANDRA-14192) netstats information mismatch between senders and receivers
Jonathan Ballet created CASSANDRA-14192: --- Summary: netstats information mismatch between senders and receivers Key: CASSANDRA-14192 URL: https://issues.apache.org/jira/browse/CASSANDRA-14192 Project: Cassandra Issue Type: Bug Components: Observability Reporter: Jonathan Ballet When adding a new node to an existing cluster, the {{netstats}} command called while the node is joining show different statistic values between the node receiving the data and the nodes sending the data. Receiving node: {code} Mode: JOINING Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816 /172.20.13.184 /172.20.30.7 Receiving 433 files, 36.64 GiB total. Already received 88 files, 4.6 GiB total [...] /172.20.40.128 /172.20.16.45 Receiving 405 files, 38.3 GiB total. Already received 86 files, 6.02 GiB total [...] /172.20.9.63 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Dropped Large messages n/a 0 0 0 Small messages n/a 0 11121 0 Gossip messages n/a 0 32690 0 {code} Sending node 1: {code} Mode: NORMAL Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816 /172.20.21.19 Sending 433 files, 36.64 GiB total. Already sent 433 files, 36.64 GiB total [...] Read Repair Statistics: Attempted: 680832 Mismatch (Blocking): 716 Mismatch (Background): 279 Pool NameActive Pending Completed Dropped Large messages n/a 2 123307 4 Small messages n/a 2 637010302 509 Gossip messages n/a23 798851 11535 {code} Sending node 2: {code} Mode: NORMAL Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816 /172.20.21.19 Sending 405 files, 38.3 GiB total. Already sent 405 files, 38.3 GiB total [...] Read Repair Statistics: Attempted: 84967 Mismatch (Blocking): 17568 Mismatch (Background): 3078 Pool NameActive Pending Completed Dropped Large messages n/a 2 17818 2 Small messages n/a 2 126082304 507 Gossip messages n/a34 202810 11725 {code} In this case, the join process is running since a while and the sending nodes seem to say they sent everything already. This output stays the same for a while though (maybe ~15% of the total joining time). However, the receiving node values stay like this once the sending nodes have sent everything, until it goes from this state to the {{NORMAL}} state (so there's visually no catching up from ~86 files to ~405 files for example, it goes directly from the state showed above to {{NORMAL}}) This makes tracking the progress of the join process a bit more difficult than needed, because we need to compare and deduce the actual state from both the receiving node values and the sending nodes values, which are both "not correct" (sending nodes say everything has been sent but stays in this state for a long time, receiving node says it still needs to download lot of files/data before finishing.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12503) Structure for netstats output format (JSON, YAML)
[ https://issues.apache.org/jira/browse/CASSANDRA-12503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279984#comment-16279984 ] Jonathan Ballet commented on CASSANDRA-12503: - [~nelio] Have you made any progress on this recently? I'm quite interested to have this so if needed, I'm willing to take over the patch and apply the changes requested by [~yukim]. > Structure for netstats output format (JSON, YAML) > - > > Key: CASSANDRA-12503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12503 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Hiroki Watanabe >Assignee: Hiroki Watanabe >Priority: Minor > Fix For: 3.11.x > > Attachments: new_receiving.def, new_receiving.json, > new_receiving.yaml, new_sending.def, new_sending.json, new_sending.yaml, > old_receiving.def, old_sending.def, trunk.patch > > > As with nodetool tpstats and tablestats (CASSANDRA-12035), nodetool netstats > should also support useful output formats such as JSON or YAML, so we > implemented it. > Please review the attached patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline
[ https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205766#comment-16205766 ] Jonathan Ballet commented on CASSANDRA-13649: - I also noticed the frequency of this error message increased a lot after upgrading to 3.11.1: {code:java} INFO [epollEventLoopGroup-2-3] 2017-10-16 10:00:37,592 Message.java:623 - Unexpected exception during request; channel = [id: 0xb253764e, L:/10.40.3.15:9042 - R:/10.30.0.10:58996] io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] {code} > Uncaught exceptions in Netty pipeline > - > > Key: CASSANDRA-13649 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13649 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging, Testing >Reporter: Stefan Podkowinski >Assignee: Norman Maurer > Labels: patch > Fix For: 2.2.11, 3.0.15, 3.11.1, 4.0 > > Attachments: > 0001-CASSANDRA-13649-Ensure-all-exceptions-are-correctly-.patch, > test_stdout.txt > > > I've noticed some netty related errors in trunk in [some of the dtest > results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink]. > Just want to make sure that we don't have to change anything related to the > exception handling in our pipeline and that this isn't a netty issue. > Actually if this causes flakiness but is otherwise harmless, we should do > something about it, even if it's just on the dtest side. > {noformat} > WARN [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > And again in another test: > {noformat} > WARN [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > Edit: > The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() > failed}} error also causes tests to fail for 3.0 and 3.11. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org