[ https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ralf Steppacher updated CASSANDRA-11105: ---------------------------------------- Description: I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress tool to work for my test scenario. I have followed the example on http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema to create a yaml file describing my test (attached). I am collecting events per user id (text, partition key). Events have a session type (text), event type (text), and creation time (timestamp) (clustering keys, in that order). Plus some more attributes required for rendering the events in a UI. For testing purposes I ended up with the following column spec and insert distribution: {noformat} columnspec: - name: created_at cluster: uniform(10..10000) - name: event_type size: uniform(5..10) population: uniform(1..30) cluster: uniform(1..30) - name: session_type size: fixed(5) population: uniform(1..4) cluster: uniform(1..4) - name: user_id size: fixed(15) population: uniform(1..1000000) - name: message size: uniform(10..100) population: uniform(1..100B) insert: partitions: fixed(1) batchtype: UNLOGGED select: fixed(1)/1200000 {noformat} Running stress tool for just the insert prints {noformat} Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000] total rows in the partitions) {noformat} and then immediately starts flooding me with {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large}}. Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the {{cassandra.yaml}} I do not understand. My understanding is that the stress tool should generate one row per batch. The size of a single row should not exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all text characters being 3 byte unicode characters. This is how I start the attached user scenario: {noformat} [rsteppac@centos bin]$ ./cassandra-stress user profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 10.211.55.8 INFO 08:00:07 Did not find Netty's native epoll transport in the classpath, defaulting to NIO. INFO 08:00:08 Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor) INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added Connected to cluster: Titan_DEV Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1 Created schema. Sleeping 1s for propagation. Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000] total rows in the partitions) com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55) at org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87) at org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159) at org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119) at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309) Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large at com.datastax.driver.core.Responses$Error.asException(Responses.java:125) at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:120) at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186) at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:45) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:752) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1003) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:925) at com.datastax.shaded.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at com.datastax.shaded.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at com.datastax.shaded.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at com.datastax.shaded.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) at com.datastax.shaded.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) ... {noformat} The C* log: {noformat} INFO 08:00:04 Listening for thrift clients... WARN 08:00:07 Detected connection using native protocol version 2. Both version 1 and 2 of the native protocol are now deprecated and support will be removed in Cassandra 3.0. You are encouraged to upgrade to a client driver using version 3 of the native protocol ERROR 08:00:14 Batch of prepared statements for [stresscql.batch_too_large] is of size 58024, exceeding specified threshold of 51200 by 6824. (see batch_size_fail_threshold_in_kb) ERROR 08:00:15 Batch of prepared statements for [stresscql.batch_too_large] is of size 77985, exceeding specified threshold of 51200 by 26785. (see batch_size_fail_threshold_in_kb) ... {noformat} was: I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress tool to work for my test scenario. I have followed the example on http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema to create a yaml file describing my test. I am collecting events per user id (text, partition key). Events have a session type (text), event type (text), and creation time (timestamp) (clustering keys, in that order). Plus some more attributes required for rendering the events in a UI. For testing purposes I ended up with the following column spec and insert distribution: {noformat} columnspec: - name: created_at cluster: uniform(10..10000) - name: event_type size: uniform(5..10) population: uniform(1..30) cluster: uniform(1..30) - name: session_type size: fixed(5) population: uniform(1..4) cluster: uniform(1..4) - name: user_id size: fixed(15) population: uniform(1..1000000) - name: message size: uniform(10..100) population: uniform(1..100B) insert: partitions: fixed(1) batchtype: UNLOGGED select: fixed(1)/1200000 {noformat} Running stress tool for just the insert prints {noformat} Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000] total rows in the partitions) {noformat} and then immediately starts flooding me with {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large}}. Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the {{cassandra.yaml}} I do not understand. My understanding is that the stress tool should generate one row per batch. The size of a single row should not exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all text characters being 3 byte unicode characters. This is how I start the attached user scenario: {noformat} [rsteppac@centos bin]$ ./cassandra-stress user profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 10.211.55.8 INFO 08:00:07 Did not find Netty's native epoll transport in the classpath, defaulting to NIO. INFO 08:00:08 Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor) INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added Connected to cluster: Titan_DEV Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1 Created schema. Sleeping 1s for propagation. Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000] total rows in the partitions) com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55) at org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87) at org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159) at org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119) at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309) Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large at com.datastax.driver.core.Responses$Error.asException(Responses.java:125) at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:120) at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186) at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:45) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:752) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1003) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:925) at com.datastax.shaded.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at com.datastax.shaded.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at com.datastax.shaded.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at com.datastax.shaded.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) at com.datastax.shaded.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) ... {noformat} The C* log: {noformat} INFO 08:00:04 Listening for thrift clients... WARN 08:00:07 Detected connection using native protocol version 2. Both version 1 and 2 of the native protocol are now deprecated and support will be removed in Cassandra 3.0. You are encouraged to upgrade to a client driver using version 3 of the native protocol ERROR 08:00:14 Batch of prepared statements for [stresscql.batch_too_large] is of size 58024, exceeding specified threshold of 51200 by 6824. (see batch_size_fail_threshold_in_kb) ERROR 08:00:15 Batch of prepared statements for [stresscql.batch_too_large] is of size 77985, exceeding specified threshold of 51200 by 26785. (see batch_size_fail_threshold_in_kb) ... {noformat} > cassandra-stress tool - InvalidQueryException: Batch too large > -------------------------------------------------------------- > > Key: CASSANDRA-11105 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11105 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Cassandra 2.2.4, Java 8, CentOS 6.5 > Reporter: Ralf Steppacher > Attachments: batch_too_large.yaml > > > I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress > tool to work for my test scenario. I have followed the example on > http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema > to create a yaml file describing my test (attached). > I am collecting events per user id (text, partition key). Events have a > session type (text), event type (text), and creation time (timestamp) > (clustering keys, in that order). Plus some more attributes required for > rendering the events in a UI. For testing purposes I ended up with the > following column spec and insert distribution: > {noformat} > columnspec: > - name: created_at > cluster: uniform(10..10000) > - name: event_type > size: uniform(5..10) > population: uniform(1..30) > cluster: uniform(1..30) > - name: session_type > size: fixed(5) > population: uniform(1..4) > cluster: uniform(1..4) > - name: user_id > size: fixed(15) > population: uniform(1..1000000) > - name: message > size: uniform(10..100) > population: uniform(1..100B) > insert: > partitions: fixed(1) > batchtype: UNLOGGED > select: fixed(1)/1200000 > {noformat} > Running stress tool for just the insert prints > {noformat} > Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000] > total rows in the partitions) > {noformat} > and then immediately starts flooding me with > {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too > large}}. > Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the > {{cassandra.yaml}} I do not understand. My understanding is that the stress > tool should generate one row per batch. The size of a single row should not > exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all > text characters being 3 byte unicode characters. > This is how I start the attached user scenario: > {noformat} > [rsteppac@centos bin]$ ./cassandra-stress user > profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose > file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node > 10.211.55.8 > INFO 08:00:07 Did not find Netty's native epoll transport in the classpath, > defaulting to NIO. > INFO 08:00:08 Using data-center name 'datacenter1' for > DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct > datacenter name with DCAwareRoundRobinPolicy constructor) > INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added > Connected to cluster: Titan_DEV > Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1 > Created schema. Sleeping 1s for propagation. > Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000] > total rows in the partitions) > com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large > at > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35) > at > com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185) > at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87) > at > org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119) > at > org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309) > Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch > too large > at > com.datastax.driver.core.Responses$Error.asException(Responses.java:125) > at > com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:120) > at > com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186) > at > com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:45) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:752) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1003) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:925) > at > com.datastax.shaded.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > com.datastax.shaded.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) > at > com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > com.datastax.shaded.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > at > com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > com.datastax.shaded.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > at > com.datastax.shaded.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at > com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > ... > {noformat} > The C* log: > {noformat} > INFO 08:00:04 Listening for thrift clients... > WARN 08:00:07 Detected connection using native protocol version 2. Both > version 1 and 2 of the native protocol are now deprecated and support will be > removed in Cassandra 3.0. You are encouraged to upgrade to a client driver > using version 3 of the native protocol > ERROR 08:00:14 Batch of prepared statements for [stresscql.batch_too_large] > is of size 58024, exceeding specified threshold of 51200 by 6824. (see > batch_size_fail_threshold_in_kb) > ERROR 08:00:15 Batch of prepared statements for [stresscql.batch_too_large] > is of size 77985, exceeding specified threshold of 51200 by 26785. (see > batch_size_fail_threshold_in_kb) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)