[ 
https://issues.apache.org/jira/browse/THRIFT-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251717#comment-13251717
 ] 

Dave Watson commented on THRIFT-1559:
-------------------------------------

Internally at facebook we use JEMALLOC http://www.canonware.com/jemalloc/ which 
has per-thread memory pools - it seems to fix this issue for us.
                
> Provide memory pool for TBinaryProtocol to eliminate memory fragmentation
> -------------------------------------------------------------------------
>
>                 Key: THRIFT-1559
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1559
>             Project: Thrift
>          Issue Type: Improvement
>          Components: C++ - Library
>    Affects Versions: 0.8
>         Environment: Linux
>            Reporter: Yingfeng Zhang
>              Labels: memory
>
> We use Thrift c++ client library (0.7/0.8) to communicate with Apache 
> Cassandra (1.0), and we need to frequently get intensive data from Cassandra. 
> The type of data got has the following definition(multiget_slice):
> std::map<std::string, std::vector<ColumnOrSuperColumn> >, where 
> ColumnOrSuperColumn is a struct composed of several std::map with std::string 
> keys.
> Supose we have 1M data, and each time we got 1k, it means 1k records will 
> exist in such struct as "std::map<std::string, 
> std::vector<ColumnOrSuperColumn> >", then we need to call thrift RPC 1K 
> times. While we destroy the above object of "std::map<std::string, 
> std::vector<ColumnOrSuperColumn> >" immediately after the RPC, which means we 
> do nothing but just perform the RPC operation. During that period, we found 
> that the memory consumption keeps growing, evenif we attach jemalloc to the 
> process for memory defragmentation. 
> No matter how we tune the batch size, say the above 1k, ranging from 10 to 
> 20k, the memory fragmentation keeps a high percentage, it means given more 
> data, say 10M, just such RPC operation will eat up the memory: In fact, our 
> process was killed by OS due to too much memory consumption. 
> We believe that the current design of memory usage of Thrift cpp client has 
> caused too much memory fragmentation and the issue appears to be more serious 
> given more data as well as more complicated struct as defined in Cassandra.
> I suggest to provide memory pool for Thrift cpp library.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to