[ 
https://issues.apache.org/jira/browse/KUDU-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858062#comment-15858062
 ] 

Fu Lili commented on KUDU-1762:
-------------------------------

We have done some testing on this issue, and almost certain that it is caused 
by the calculation of block cache's usage memory. 
{code}
  virtual PendingHandle* Allocate(Slice key, int val_len, int charge) OVERRIDE 
{                                                                               
                    
    int key_len = key.size();                                                   
                                                                                
                   
    DCHECK_GE(key_len, 0);                                                      
                                                                                
                   
    DCHECK_GE(val_len, 0);
    int key_len_padded = KUDU_ALIGN_UP(key_len, sizeof(void*));                 
                                                                                
                   
    uint8_t* buf = new uint8_t[sizeof(LRUHandle)
                               + key_len_padded + val_len // the kv_data VLA 
data                                                                            
                      
                               - 1 // (the VLA has a 1-byte placeholder)        
                                                                                
                   
                               ];
    LRUHandle* handle = reinterpret_cast<LRUHandle*>(buf);                      
                                                                                
                   
    handle->key_length = key_len;                                               
                                                                                
                   
    handle->val_length = val_len;                                               
                                                                                
                   
    handle->charge = charge;
    handle->hash = HashSlice(key);
    memcpy(handle->kv_data, key.data(), key_len);                               
                                                                                
                   
    
    return reinterpret_cast<PendingHandle*>(handle);                            
                                                                                
                   
  }                                                                             
                                                                                
                   
{code}

Block cache's usage memory is calculated by the size of value, but it actually 
malloc more memory, so the block cache's memory limit is actually inaccurate. 
And here is how we reproduce the issue:

1. create a table with 600 hundreds columns.
2. insert 10 rows, and all columns use null values except key column. This 
operation will cause kudu create many little data blocks.
3. query all columns use SQL like SELECT COUNT(col_1), COUNT(col_2) ... FROM 
test. This operation will cause kudu cache these data blocks.
4. repeat step 2 and 3, and we will found than kudu malloc much more memory 
than config block_cache_capacity_mb.

> suspected tablet memory leak
> ----------------------------
>
>                 Key: KUDU-1762
>                 URL: https://issues.apache.org/jira/browse/KUDU-1762
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 1.0.1
>         Environment: CentOS 6.5
> Kudu 1.0.1 (rev e60b610253f4303b24d41575f7bafbc5d69edddb)
>            Reporter: Fu Lili
>            Priority: Critical
>         Attachments: 0B2CE7BB-EF26-4EA1-B824-3584D7D79256.png, 
> kudu_heap_prof_20161206.tar.gz, mem_rss_graph_2016_12_19.png, 
> server02_30day_rss_before_and_after_mrs_flag_2.png, 
> server02_30day_rss_before_and_after_mrs_flag.png, tserver_smaps1
>
>
> here is the memory total info:
> {quote}
> ------------------------------------------------
> MALLOC:     1691715680 ( 1613.3 MiB) Bytes in use by application
> MALLOC: +    178733056 (  170.5 MiB) Bytes in page heap freelist
> MALLOC: +     37483104 (   35.7 MiB) Bytes in central cache freelist
> MALLOC: +      4071488 (    3.9 MiB) Bytes in transfer cache freelist
> MALLOC: +     13739264 (   13.1 MiB) Bytes in thread cache freelists
> MALLOC: +     12202144 (   11.6 MiB) Bytes in malloc metadata
> MALLOC:   ------------
> MALLOC: =   1937944736 ( 1848.2 MiB) Actual memory used (physical + swap)
> MALLOC: +       311296 (    0.3 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   ------------
> MALLOC: =   1938256032 ( 1848.5 MiB) Virtual address space used
> MALLOC:
> MALLOC:         174694              Spans in use
> MALLOC:            201              Thread heaps in use
> MALLOC:           8192              Tcmalloc page size
> ------------------------------------------------
> Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
> Bytes released to the OS take up virtual address space but no physical memory.
> {quote}
> but in memroy detail, sum of all the sub Current Consumption is far less than 
> the to the root Current Consumption。
> ||Id||Parent||Limit||Current Consumption||Peak consumption||
> |root|none|4.00G|1.58G|1.74G|
> |log_cache|root|1.00G|480.8K|5.32M|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:70c8d889b0314b04a240fcb02c24a012|log_cache|128.00M|160B|160B|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:16d3c8193579445f8f766da6c7abc237|log_cache|128.00M|160B|160B|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:2c69c5cb9eb04eb48323a9268afc36a7|log_cache|128.00M|160B|160B|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:2b11d9220dab4a5f952c5b1c10a68ccd|log_cache|128.00M|69.2K|139.5K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:cec045be60af4f759497234d8815238b|log_cache|128.00M|68.6K|138.7K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:cea7a54cebd242e4997da641f5b32e3a|log_cache|128.00M|68.5K|139.3K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:9625dfde17774690a888b55024ac797a|log_cache|128.00M|68.5K|140.0K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:6046b33901ca43d0975f59cf7e491186|log_cache|128.00M|0B|133.0K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:1a18ab0915f0407b922fa7ecbe7a2f46|log_cache|128.00M|0B|132.6K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:ac54d1c1813a4e39943971cb56f248ef|log_cache|128.00M|0B|130.5K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:4438580df6cc4d469393b9d6adee68d8|log_cache|128.00M|0B|131.2K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:2f1cef7d2a494575b941baa22b8a3dc9|log_cache|128.00M|0B|131.6K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:d2ad22d202c04b2d98f1c5800df1c3b5|log_cache|128.00M|0B|132.5K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:b19b21d6b4c84f9895aad9e81559d019|log_cache|128.00M|0B|131.0K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:27e9531cd5814b1c9637493f05860b19|log_cache|128.00M|0B|131.1K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:425a19940239447faa0eaab4e380d644|log_cache|128.00M|68.5K|146.9K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:178bd7bc39a941a887f393b0a7848066|log_cache|128.00M|68.5K|139.9K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:91524acd28a440318918f11292ac8fdc|log_cache|128.00M|0B|132.0K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:be6f093aabf9460b97fc35dd026820b6|log_cache|128.00M|0B|130.4K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:dd8dd794f0f44426a3c46ce8f4b54652|log_cache|128.00M|0B|131.2K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:ed128ca7b19c4e3eaa48e9e3eb341492|log_cache|128.00M|68.5K|141.5K|
> |block_cache-sharded_lru_cache|root|none|257.05M|257.05M|
> |code_cache-sharded_lru_cache|root|none|112B|113B|
> |server|root|none|2.06M|121.97M|
> |tablet-70c8d889b0314b04a240fcb02c24a012|server|none|265B|265B|
> |txn_tracker|tablet-70c8d889b0314b04a240fcb02c24a012|64.00M|0B|0B|
> |MemRowSet-0|tablet-70c8d889b0314b04a240fcb02c24a012|none|265B|265B|
> |DeltaMemStores|tablet-70c8d889b0314b04a240fcb02c24a012|none|0B|0B|
> |tablet-16d3c8193579445f8f766da6c7abc237|server|none|265B|265B|
> |txn_tracker|tablet-16d3c8193579445f8f766da6c7abc237|64.00M|0B|0B|
> |MemRowSet-0|tablet-16d3c8193579445f8f766da6c7abc237|none|265B|265B|
> |DeltaMemStores|tablet-16d3c8193579445f8f766da6c7abc237|none|0B|0B|
> |tablet-2c69c5cb9eb04eb48323a9268afc36a7|server|none|265B|265B|
> |txn_tracker|tablet-2c69c5cb9eb04eb48323a9268afc36a7|64.00M|0B|0B|
> |MemRowSet-0|tablet-2c69c5cb9eb04eb48323a9268afc36a7|none|265B|265B|
> |DeltaMemStores|tablet-2c69c5cb9eb04eb48323a9268afc36a7|none|0B|0B|
> |tablet-2b11d9220dab4a5f952c5b1c10a68ccd|server|none|25.7K|193.7K|
> |MemRowSet-5|tablet-2b11d9220dab4a5f952c5b1c10a68ccd|none|25.4K|25.4K|
> |txn_tracker|tablet-2b11d9220dab4a5f952c5b1c10a68ccd|64.00M|0B|70.2K|
> |DeltaMemStores|tablet-2b11d9220dab4a5f952c5b1c10a68ccd|none|265B|1.0K|
> |tablet-cec045be60af4f759497234d8815238b|server|none|58.6K|192.9K|
> |MemRowSet-5|tablet-cec045be60af4f759497234d8815238b|none|58.3K|58.3K|
> |txn_tracker|tablet-cec045be60af4f759497234d8815238b|64.00M|0B|70.1K|
> |DeltaMemStores|tablet-cec045be60af4f759497234d8815238b|none|265B|1.0K|
> |tablet-cea7a54cebd242e4997da641f5b32e3a|server|none|124.4K|193.5K|
> |MemRowSet-5|tablet-cea7a54cebd242e4997da641f5b32e3a|none|124.1K|124.1K|
> |txn_tracker|tablet-cea7a54cebd242e4997da641f5b32e3a|64.00M|0B|70.0K|
> |DeltaMemStores|tablet-cea7a54cebd242e4997da641f5b32e3a|none|265B|795B|
> |tablet-9625dfde17774690a888b55024ac797a|server|none|530B|326.6K|
> |MemRowSet-22|tablet-9625dfde17774690a888b55024ac797a|none|265B|265B|
> |txn_tracker|tablet-9625dfde17774690a888b55024ac797a|64.00M|0B|71.3K|
> |DeltaMemStores|tablet-9625dfde17774690a888b55024ac797a|none|265B|1.3K|
> |tablet-6046b33901ca43d0975f59cf7e491186|server|none|530B|587.7K|
> |MemRowSet-22|tablet-6046b33901ca43d0975f59cf7e491186|none|265B|265B|
> |txn_tracker|tablet-6046b33901ca43d0975f59cf7e491186|64.00M|0B|139.3K|
> |DeltaMemStores|tablet-6046b33901ca43d0975f59cf7e491186|none|265B|1.0K|
> |tablet-1a18ab0915f0407b922fa7ecbe7a2f46|server|none|530B|383.4K|
> |MemRowSet-22|tablet-1a18ab0915f0407b922fa7ecbe7a2f46|none|265B|265B|
> |txn_tracker|tablet-1a18ab0915f0407b922fa7ecbe7a2f46|64.00M|0B|70.5K|
> |DeltaMemStores|tablet-1a18ab0915f0407b922fa7ecbe7a2f46|none|265B|1.0K|
> |tablet-ac54d1c1813a4e39943971cb56f248ef|server|none|530B|324.5K|
> |MemRowSet-11|tablet-ac54d1c1813a4e39943971cb56f248ef|none|265B|265B|
> |txn_tracker|tablet-ac54d1c1813a4e39943971cb56f248ef|64.00M|0B|69.7K|
> |DeltaMemStores|tablet-ac54d1c1813a4e39943971cb56f248ef|none|265B|1.0K|
> |tablet-4438580df6cc4d469393b9d6adee68d8|server|none|530B|325.4K|
> |MemRowSet-11|tablet-4438580df6cc4d469393b9d6adee68d8|none|265B|265B|
> |txn_tracker|tablet-4438580df6cc4d469393b9d6adee68d8|64.00M|0B|69.9K|
> |DeltaMemStores|tablet-4438580df6cc4d469393b9d6adee68d8|none|265B|1.0K|
> |tablet-2f1cef7d2a494575b941baa22b8a3dc9|server|none|530B|325.3K|
> |MemRowSet-11|tablet-2f1cef7d2a494575b941baa22b8a3dc9|none|265B|265B|
> |txn_tracker|tablet-2f1cef7d2a494575b941baa22b8a3dc9|64.00M|0B|70.3K|
> |DeltaMemStores|tablet-2f1cef7d2a494575b941baa22b8a3dc9|none|265B|1.0K|
> |tablet-d2ad22d202c04b2d98f1c5800df1c3b5|server|none|530B|326.0K|
> |MemRowSet-22|tablet-d2ad22d202c04b2d98f1c5800df1c3b5|none|265B|265B|
> |txn_tracker|tablet-d2ad22d202c04b2d98f1c5800df1c3b5|64.00M|0B|136.7K|
> |DeltaMemStores|tablet-d2ad22d202c04b2d98f1c5800df1c3b5|none|265B|1.0K|
> |tablet-b19b21d6b4c84f9895aad9e81559d019|server|none|530B|326.3K|
> |MemRowSet-22|tablet-b19b21d6b4c84f9895aad9e81559d019|none|265B|265B|
> |txn_tracker|tablet-b19b21d6b4c84f9895aad9e81559d019|64.00M|0B|70.0K|
> |DeltaMemStores|tablet-b19b21d6b4c84f9895aad9e81559d019|none|265B|1.0K|
> |tablet-27e9531cd5814b1c9637493f05860b19|server|none|530B|327.8K|
> |MemRowSet-22|tablet-27e9531cd5814b1c9637493f05860b19|none|265B|265B|
> |txn_tracker|tablet-27e9531cd5814b1c9637493f05860b19|64.00M|0B|71.8K|
> |DeltaMemStores|tablet-27e9531cd5814b1c9637493f05860b19|none|265B|1.3K|
> |tablet-425a19940239447faa0eaab4e380d644|server|none|795B|332.9K|
> |MemRowSet-11|tablet-425a19940239447faa0eaab4e380d644|none|265B|265B|
> |txn_tracker|tablet-425a19940239447faa0eaab4e380d644|64.00M|0B|76.8K|
> |DeltaMemStores|tablet-425a19940239447faa0eaab4e380d644|none|530B|1.0K|
> |tablet-178bd7bc39a941a887f393b0a7848066|server|none|530B|325.8K|
> |MemRowSet-10|tablet-178bd7bc39a941a887f393b0a7848066|none|265B|265B|
> |txn_tracker|tablet-178bd7bc39a941a887f393b0a7848066|64.00M|0B|70.4K|
> |DeltaMemStores|tablet-178bd7bc39a941a887f393b0a7848066|none|265B|1.0K|
> |tablet-91524acd28a440318918f11292ac8fdc|server|none|530B|326.4K|
> |MemRowSet-11|tablet-91524acd28a440318918f11292ac8fdc|none|265B|265B|
> |txn_tracker|tablet-91524acd28a440318918f11292ac8fdc|64.00M|0B|72.0K|
> |DeltaMemStores|tablet-91524acd28a440318918f11292ac8fdc|none|265B|1.0K|
> |tablet-be6f093aabf9460b97fc35dd026820b6|server|none|530B|588.6K|
> |MemRowSet-22|tablet-be6f093aabf9460b97fc35dd026820b6|none|265B|265B|
> |txn_tracker|tablet-be6f093aabf9460b97fc35dd026820b6|64.00M|0B|72.3K|
> |DeltaMemStores|tablet-be6f093aabf9460b97fc35dd026820b6|none|265B|1.0K|
> |tablet-dd8dd794f0f44426a3c46ce8f4b54652|server|none|530B|325.7K|
> |MemRowSet-22|tablet-dd8dd794f0f44426a3c46ce8f4b54652|none|265B|265B|
> |txn_tracker|tablet-dd8dd794f0f44426a3c46ce8f4b54652|64.00M|0B|72.4K|
> |DeltaMemStores|tablet-dd8dd794f0f44426a3c46ce8f4b54652|none|265B|795B|
> |tablet-ed128ca7b19c4e3eaa48e9e3eb341492|server|none|530B|325.6K|
> |MemRowSet-22|tablet-ed128ca7b19c4e3eaa48e9e3eb341492|none|265B|265B|
> |txn_tracker|tablet-ed128ca7b19c4e3eaa48e9e3eb341492|64.00M|0B|71.7K|
> |DeltaMemStores|tablet-ed128ca7b19c4e3eaa48e9e3eb341492|none|265B|1.0K|
> |log_block_manager|server|none|1.84M|6.19M|
> |result-tracker|server|none|0B|0B



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to