[jira] [Created] (CASSANDRA-14192) netstats information mismatch between senders and receivers

2018-01-25 Thread Jonathan Ballet (JIRA)
Jonathan Ballet created CASSANDRA-14192:
---

 Summary: netstats information mismatch between senders and 
receivers
 Key: CASSANDRA-14192
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14192
 Project: Cassandra
  Issue Type: Bug
  Components: Observability
Reporter: Jonathan Ballet


When adding a new node to an existing cluster, the {{netstats}} command called 
while the node is joining show different statistic values between the node 
receiving the data and the nodes sending the data.

Receiving node:
{code}
Mode: JOINING
Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
/172.20.13.184
/172.20.30.7
Receiving 433 files, 36.64 GiB total. Already received 88 files, 4.6 
GiB total
[...]
/172.20.40.128
/172.20.16.45
Receiving 405 files, 38.3 GiB total. Already received 86 files, 6.02 
GiB total
[...]
/172.20.9.63
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed   Dropped
Large messages  n/a 0  0 0
Small messages  n/a 0  11121 0
Gossip messages n/a 0  32690 0
{code}

Sending node 1:
{code}
Mode: NORMAL
Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
/172.20.21.19
Sending 433 files, 36.64 GiB total. Already sent 433 files, 36.64 GiB 
total
[...]
Read Repair Statistics:
Attempted: 680832
Mismatch (Blocking): 716
Mismatch (Background): 279
Pool NameActive   Pending  Completed   Dropped
Large messages  n/a 2 123307 4
Small messages  n/a 2  637010302   509
Gossip messages n/a23 798851 11535
{code}

Sending node 2:
{code}
Mode: NORMAL
Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
/172.20.21.19
Sending 405 files, 38.3 GiB total. Already sent 405 files, 38.3 GiB 
total
[...]
Read Repair Statistics:
Attempted: 84967
Mismatch (Blocking): 17568
Mismatch (Background): 3078
Pool NameActive   Pending  Completed   Dropped
Large messages  n/a 2  17818 2
Small messages  n/a 2  126082304   507
Gossip messages n/a34 202810 11725
{code}

In this case, the join process is running since a while and the sending nodes 
seem to say they sent everything already. This output stays the same for a 
while though (maybe ~15% of the total joining time).

However, the receiving node values stay like this once the sending nodes have 
sent everything, until it goes from this state to the {{NORMAL}} state (so 
there's visually no catching up from ~86 files to ~405 files for example, it 
goes directly from the state showed above to {{NORMAL}})

This makes tracking the progress of the join process a bit more difficult than 
needed, because we need to compare and deduce the actual state from both the 
receiving node values and the sending nodes values, which are both "not 
correct" (sending nodes say everything has been sent but stays in this state 
for a long time, receiving node says it still needs to download lot of 
files/data before finishing.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12503) Structure for netstats output format (JSON, YAML)

2017-12-06 Thread Jonathan Ballet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279984#comment-16279984
 ] 

Jonathan Ballet commented on CASSANDRA-12503:
-

[~nelio] Have you made any progress on this recently?

I'm quite interested to have this so if needed, I'm willing to take over the 
patch and apply the changes requested by [~yukim].

> Structure for netstats output format (JSON, YAML)
> -
>
> Key: CASSANDRA-12503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12503
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Hiroki Watanabe
>Assignee: Hiroki Watanabe
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: new_receiving.def, new_receiving.json, 
> new_receiving.yaml, new_sending.def, new_sending.json, new_sending.yaml, 
> old_receiving.def, old_sending.def, trunk.patch
>
>
> As with nodetool tpstats and tablestats (CASSANDRA-12035), nodetool netstats 
> should also support useful output formats such as JSON or YAML, so we 
> implemented it. 
> Please review the attached patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-10-16 Thread Jonathan Ballet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205766#comment-16205766
 ] 

Jonathan Ballet commented on CASSANDRA-13649:
-

I also noticed the frequency of this error message increased a lot after 
upgrading to 3.11.1:

{code:java}
INFO  [epollEventLoopGroup-2-3] 2017-10-16 10:00:37,592 Message.java:623 - 
Unexpected exception during request; channel = [id: 0xb253764e, 
L:/10.40.3.15:9042 - R:/10.30.0.10:58996]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
{code}


> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging, Testing
>Reporter: Stefan Podkowinski
>Assignee: Norman Maurer
>  Labels: patch
> Fix For: 2.2.11, 3.0.15, 3.11.1, 4.0
>
> Attachments: 
> 0001-CASSANDRA-13649-Ensure-all-exceptions-are-correctly-.patch, 
> test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org