[ 
https://issues.apache.org/jira/browse/THRIFT-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253500#comment-13253500
 ] 

Yingfeng Zhang edited comment on THRIFT-1559 at 4/13/12 4:12 PM:
-----------------------------------------------------------------

Latest feedback:
All of above mentioned experiments are performed under Fedora 15 desktop, 
x86_64, gcc 4.6.
However, when we switched to rack server which uses Ubuntu 10.4,x86_64, gcc 
4.45, the results are different:
Lockless allocator(locklessinc.com) performed worse than Jemalloc: The former 
has a fragmentation of 1.5G, while the latter has 300M. Lockless allocator 
performs the same over different environments, while Jemalloc performs much 
better than that of desktop, and better than Lockless allocator.
                
      was (Author: yingfeng):
    Latest feedback:
All of above mentioned experiments are performed under Fedora 15 desktop, 
x86_64, gcc 4.6.
However, when we switched to rack server which uses ubuntu 10.4,x86_64, gcc 
4.45, the results are different:
Lockless allocator performed worse than Jemalloc: The former has a 
fragmentation of 1.6G, while the latter has 300M. Both are much less than that 
of desktop.
                  
> Provide memory pool for TBinaryProtocol to eliminate memory fragmentation
> -------------------------------------------------------------------------
>
>                 Key: THRIFT-1559
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1559
>             Project: Thrift
>          Issue Type: Improvement
>          Components: C++ - Library
>    Affects Versions: 0.8
>         Environment: Linux
>            Reporter: Yingfeng Zhang
>              Labels: memory
>
> We use Thrift c++ client library (0.7/0.8) to communicate with Apache 
> Cassandra (1.0), and we need to frequently get intensive data from Cassandra. 
> The type of data got has the following definition(multiget_slice):
> std::map<std::string, std::vector<ColumnOrSuperColumn> >, where 
> ColumnOrSuperColumn is a struct composed of several std::map with std::string 
> keys.
> Supose we have 1M data, and each time we got 1k, it means 1k records will 
> exist in such struct as "std::map<std::string, 
> std::vector<ColumnOrSuperColumn> >", then we need to call thrift RPC 1K 
> times. While we destroy the above object of "std::map<std::string, 
> std::vector<ColumnOrSuperColumn> >" immediately after the RPC, which means we 
> do nothing but just perform the RPC operation. During that period, we found 
> that the memory consumption keeps growing, evenif we attach jemalloc to the 
> process for memory defragmentation. 
> No matter how we tune the batch size, say the above 1k, ranging from 10 to 
> 20k, the memory fragmentation keeps a high percentage, it means given more 
> data, say 10M, just such RPC operation will eat up the memory: In fact, our 
> process was killed by OS due to too much memory consumption. 
> We believe that the current design of memory usage of Thrift cpp client has 
> caused too much memory fragmentation and the issue appears to be more serious 
> given more data as well as more complicated struct as defined in Cassandra.
> I suggest to provide memory pool for Thrift cpp library.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to