[ https://issues.apache.org/jira/browse/CASSANDRA-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257123#comment-14257123 ]
T Jake Luciani commented on CASSANDRA-8518: ------------------------------------------- Is this a duplicate of CASSANDRA-7402 ? > Cassandra Query Request Size Estimator > -------------------------------------- > > Key: CASSANDRA-8518 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8518 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Cheng Ren > > We have been suffering from cassandra node crash due to out of memory for a > long time. The heap dump from the recent crash shows there are 22 native > transport request threads each of which consumes 3.3% of heap size, taking > more than 70% in total. > Heap dump: > !https://dl-web.dropbox.com/get/attach1.png?_subject_uid=303980955&w=AAAVOoncBoZ5aOPbDg2TpRkUss7B-2wlrnhUAv19b27OUA|height=400,width=600! > Expanded view of one thread: > !https://dl-web.dropbox.com/get/Screen%20Shot%202014-12-18%20at%204.06.29%20PM.png?_subject_uid=303980955&w=AACUO4wrbxheRUxv8fwQ9P52T6gBOm5_g9zeIe8odu3V3w|height=400,width=600! > The cassandra we are using now (2.0.4) utilized MemoryAwareThreadPoolExecutor > as the request executor and provided a default request size estimator which > constantly returns 1, meaning it limits only the number of requests being > pushed to the pool. To have more fine-grained control on handling requests > and better protect our node from OOM issue, we propose implementing a more > precise estimator. > Here is our two cents: > For update/delete/insert request: Size could be estimated by adding size of > all class members together. > For scan query, the major part of the request is response, which can be > estimated from the history data. For example if we receive a scan query on a > column family for a certain token range, we keep track of its response size > used as the estimated response size for later scan query on the same cf. > For future requests on the same cf, response size could be calculated by > token range*recorded size/ recorded token range. The request size should be > estimated as (query size + estimated response size). > We believe what we're proposing here can be useful for other people in the > Cassandra community as well. Would you mind providing us feedbacks? Please > let us know if you have any concerns or suggestions regarding this proposal. > Thanks, > Cheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)