[ https://issues.apache.org/jira/browse/CASSANDRA-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442538#comment-13442538 ]
Tyler Hobbs commented on CASSANDRA-4573: ---------------------------------------- Vijay, I'm actually not seeing very long garbage collections, if I'm reading the logs correctly. These are the relevant logs, running with a heap of 2GB and young gen size of 400MB: {noformat} {Heap before GC invocations=0 (full 0): par new generation total 368640K, used 327680K [0x2f200000, 0x48200000, 0x48200000) eden space 327680K, 100% used [0x2f200000, 0x43200000, 0x43200000) from space 40960K, 0% used [0x43200000, 0x43200000, 0x45a00000) to space 40960K, 0% used [0x45a00000, 0x45a00000, 0x48200000) concurrent mark-sweep generation total 1687552K, used 0K [0x48200000, 0xaf200000, 0xaf200000) concurrent-mark-sweep perm gen total 16384K, used 14333K [0xaf200000, 0xb0200000, 0xb3200000) 2012-08-27T12:03:56.096-0500: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 432013312 Max Chunk Size: 432013312 Number of Blocks: 1 Av. Block Size: 432013312 Tree Height: 1 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 [ParNew Desired survivor size 20971520 bytes, new threshold 1 (max 1) - age 1: 2692712 bytes, 2692712 total : 327680K->2642K(368640K), 0.0564410 secs] 327680K->2642K(2056192K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 431996928 Max Chunk Size: 431996928 Number of Blocks: 1 Av. Block Size: 431996928 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 0.0567720 secs] [Times: user=0.03 sys=0.00, real=0.06 secs] Heap after GC invocations=1 (full 0): par new generation total 368640K, used 2642K [0x2f200000, 0x48200000, 0x48200000) eden space 327680K, 0% used [0x2f200000, 0x2f200000, 0x43200000) from space 40960K, 6% used [0x45a00000, 0x45c94998, 0x48200000) to space 40960K, 0% used [0x43200000, 0x43200000, 0x45a00000) concurrent mark-sweep generation total 1687552K, used 0K [0x48200000, 0xaf200000, 0xaf200000) concurrent-mark-sweep perm gen total 16384K, used 14333K [0xaf200000, 0xb0200000, 0xb3200000) } Total time for which application threads were stopped: 0.0576140 seconds Total time for which application threads were stopped: 0.0080490 seconds Total time for which application threads were stopped: 0.0000810 seconds Total time for which application threads were stopped: 0.0000410 seconds Total time for which application threads were stopped: 0.0000360 seconds Total time for which application threads were stopped: 0.0000340 seconds Total time for which application threads were stopped: 0.0000360 seconds Total time for which application threads were stopped: 0.0000340 seconds Total time for which application threads were stopped: 0.0000340 seconds Total time for which application threads were stopped: 0.0000320 seconds Total time for which application threads were stopped: 0.0000350 seconds Total time for which application threads were stopped: 0.0000350 seconds Total time for which application threads were stopped: 0.0000350 seconds Total time for which application threads were stopped: 0.0000370 seconds Total time for which application threads were stopped: 0.0000360 seconds Total time for which application threads were stopped: 0.0000350 seconds Total time for which application threads were stopped: 0.0000350 seconds Total time for which application threads were stopped: 0.0000340 seconds Total time for which application threads were stopped: 0.0000340 seconds Total time for which application threads were stopped: 0.0000340 seconds Total time for which application threads were stopped: 0.0000330 seconds Total time for which application threads were stopped: 0.0000360 seconds Total time for which application threads were stopped: 0.0000320 seconds Total time for which application threads were stopped: 0.0000340 seconds Total time for which application threads were stopped: 0.0000330 seconds Total time for which application threads were stopped: 0.0000330 seconds Total time for which application threads were stopped: 0.0000330 seconds Total time for which application threads were stopped: 0.0000350 seconds Total time for which application threads were stopped: 0.0000330 seconds Total time for which application threads were stopped: 0.0000330 seconds Total time for which application threads were stopped: 0.0000320 seconds Total time for which application threads were stopped: 0.0000330 seconds Total time for which application threads were stopped: 0.0000760 seconds Total time for which application threads were stopped: 0.0000490 seconds Total time for which application threads were stopped: 0.0000330 seconds Total time for which application threads were stopped: 0.0000370 seconds Total time for which application threads were stopped: 0.0000460 seconds Total time for which application threads were stopped: 0.0000350 seconds Total time for which application threads were stopped: 0.0004150 seconds Total time for which application threads were stopped: 0.0001230 seconds Total time for which application threads were stopped: 0.0035150 seconds {noformat} The client-side socket timeout is set to 3 seconds, so it's not hitting that timeout due to garbage collections. I should also note that the client-side error is different when there is a client socket timeout (something like {{TTransportException: timed out reading 4 bytes}}). > HSHA doesn't handle large messages gracefully > --------------------------------------------- > > Key: CASSANDRA-4573 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4573 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Tyler Hobbs > Assignee: Vijay > Attachments: repro.py > > > HSHA doesn't seem to enforce any kind of max message length, and when > messages are too large, it doesn't fail gracefully. > With debug logs enabled, you'll see this: > {{DEBUG 13:13:31,805 Unexpected state 16}} > Which seems to mean that there's a SelectionKey that's valid, but isn't ready > for reading, writing, or accepting. > Client-side, you'll get this thrift error (while trying to read a frame as > part of {{recv_batch_mutate}}): > {{TTransportException: TSocket read 0 bytes}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira