[ 
https://issues.apache.org/jira/browse/CASSANDRA-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146469#comment-14146469
 ] 

Jacek Furmankiewicz commented on CASSANDRA-7303:
------------------------------------------------

I am really surprised. 

We can bring down an entire server down with a query and it won't be fixed?

The worst part about this bug is that it is customer-specific.
On one installation with maybe less data, the exact same query works great.

On a different site, with just a bit too much data, the entire server crashes.
Actually, in one case we had the entire cluster (3 nodes) crash as we were 
rebooting app servers at the same time.

I would expect a QueryTooLargeException or something but not an entire 
server/cluster to crash.



> OutOfMemoryError during prolonged batch processing
> --------------------------------------------------
>
>                 Key: CASSANDRA-7303
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7303
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Server: RedHat 6, 64-bit, Oracle JDK 7, Cassandra 2.0.6
> Client: Java 7, Astyanax
>            Reporter: Jacek Furmankiewicz
>              Labels: crash, outofmemory, qa-resolved
>
> We have a prolonged batch processing job. 
> It writes a lot of records, every batch mutation creates probably on average 
> 300-500 columns per row key (with many disparate row keys).
> It works fine but within a few hours we get error like this:
> ERROR [Thrift:15] 2014-05-24 14:16:20,192 CassandraDaemon.java (line |
> |196) Except                                                          |
> |ion in thread Thread[Thrift:15,5,main]                               |
> |java.lang.OutOfMemoryError: Requested array size exceeds VM limit    |
> |at java.util.Arrays.copyOf(Arrays.java:2271)                         |
> |at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)|
> |at java.io.ByteArrayOutputStream.ensureCapacity                      |
> |(ByteArrayOutputStream.ja                                            |
> |va:93)                                                               |
> |at java.io.ByteArrayOutputStream.write                               |
> |(ByteArrayOutputStream.java:140)                                     |
> |at org.apache.thrift.transport.TFramedTransport.write                |
> |(TFramedTransport.j                                                  |
> |ava:146)                                                             |
> |at org.apache.thrift.protocol.TBinaryProtocol.writeBinary            |
> |(TBinaryProtoco                                                      |
> |l.java:183)                                                          |
> |at org.apache.cassandra.thrift.Column$ColumnStandardScheme.write     |
> |(Column.                                                             |
> |java:678)                                                            |
> |at org.apache.cassandra.thrift.Column$ColumnStandardScheme.write     |
> |(Column.                                                             |
> |java:611)                                                            |
> |at org.apache.cassandra.thrift.Column.write(Column.java:538)         |
> |at org.apache.cassandra.thrift.ColumnOrSuperColumn                   |
> |$ColumnOrSuperColumnSt                                               |
> |andardScheme.write(ColumnOrSuperColumn.java:673)                     |
> |at org.apache.cassandra.thrift.ColumnOrSuperColumn                   |
> |$ColumnOrSuperColumnSt                                               |
> |andardScheme.write(ColumnOrSuperColumn.java:607)                     |
> |at org.apache.cassandra.thrift.ColumnOrSuperColumn.write             |
> |(ColumnOrSuperCo                                                     |
> |lumn.java:517)                                                       |
> |at org.apache.cassandra.thrift.Cassandra$get_slice_result            |
> |$get_slice_resu                                                      |
> |ltStandardScheme.write(Cassandra.java:11682)                         |
> |at org.apache.cassandra.thrift.Cassandra$get_slice_result            |
> |$get_slice_resu                                                      |
> |ltStandardScheme.write(Cassandra.java:11603)                         |
> |at org.apache.cassandra.thrift.Cassandra
> The server already has 16 GB heap, which we hear is the max Cassandra can run 
> with. The writes are heavily multi-threaded from a single server.
> The jist of the issue is that Cassandra should not crash with OOM when under 
> heavy load. It is  OK  to slow down, even maybe start throwing operation 
> timeout exceptions, etc.
> But to just crash in the middle of the processing should not be allowed.
> is there any internal monitoring of heap usage in Cassandra where it could 
> detect that it is getting close to the heap limit and start throttling the 
> incoming requests to avoid this type of error?
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to