[ 
https://issues.apache.org/jira/browse/DRILL-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530931#comment-14530931
 ] 

Hanifi Gunes commented on DRILL-2957:
-------------------------------------

There are couple of problems here:

1) Our tracing models seems wrong. Tracer runs on its individual thread. Since 
there is no synchronization involved while reporting, tracer may end up reading 
stale data in case concurrent deallocations take place(infamous problem of 
visibility). This is particularly true when a churn of resource cleaning occurs 
during query finalization with large queries(such like in this case). 

2) More importantly Netty's re-bucketing implementation at [1] seems wrong.

I have a PoC covering both cases at [2] where I re-prod the same outcome of 
wrong bucketing with 0% usage.

Steps are easy:
1) allocate an entire chunk
2) wait for tracer to report
3) release the buffer

Chunk with zero usage ends up in 75-100% bucket when it is free-ed. Here is an 
excerpt from log:
{panel}
Memory Usage: 
4 direct arena(s):
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
Chunk(257187ea: 0%, 0/16777216)
Chunk(s) at 100%:
none
{panel}

1: 
https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/PoolChunkList.java#L78
2: https://github.com/hnfgns/incubator-drill/commits/DRILL-2957

> Netty Memory Manager doesn't move empty chunks between lists
> ------------------------------------------------------------
>
>                 Key: DRILL-2957
>                 URL: https://issues.apache.org/jira/browse/DRILL-2957
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>            Reporter: Jacques Nadeau
>            Assignee: Hanifi Gunes
>            Priority: Critical
>             Fix For: 1.0.0
>
>
> I'm seeing a pattern in the memory allocator and I need you to take a look at 
> it.  Here are the basic concepts:
> 1) We use an extension of PooledByteBufAllocator [1] called 
> PooledByteBufAllocatorL.
> 2) We use many Direct Arenas (generally one per core)
> 3) Each arena has chunk lists for different occupancies (chunks that are 
> empty, chunks 25% full, chunks 50% full, etc) [2]
> 4) Each of these chunk lists maintains a list of chunks.  The chunks move 
> from list to list as they get more or less full.
> 5) When no memory is being used, chunks move back to the empty list.
> 6) If there are excessive empty chunks, they are released back to the OS. (I 
> don't remember the exact trigger here and I'm only seeing this sometimes 
> right now.)
> We're running on Netty 4.0.27.  
> What I'm seeing is that we don't seem to be moving the chunks back to the 
> empty list as they are vacated.  You can see an example output from my memory 
> logging [3] that is enabled by [4].  I haven't replicated this at small scale 
> but at large scale I see it consistently (30 node cluster, large group by 
> query [5]).
> I want to understand this behavior better, determine if it is a bug or not 
> and determine whether or not this hurts memory for subsequent queries.
> One other note, Netty will cache small amounts of memory that is allocated 
> and released on the same thread for that thread.  I don't believe this is a 
> large amount of memory but be aware of it. It should be possible to control 
> this using these settings [6].
> [1] 
> https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java
> [2] 
> https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/PoolArena.java#L67
> [3] Memory log output at idle after large query (one example arena out of 32 
> on perf cluster, see logs on those nodes for more info):
> ::snip::
> Chunk(s) at 0~25%:
> none
> Chunk(s) at 0~50%:
> Chunk(62194b16: 0%, 0/16777216)
> Chunk(35983868: 1%, 8192/16777216)
> Chunk(5bbfb16a: 1%, 163840/16777216)
> Chunk(1c6d277e: 1%, 8192/16777216)
> Chunk(2897b6bf: 2%, 204800/16777216)
> Chunk(287d5c71: 0%, 0/16777216)
> Chunk(s) at 25~75%:
> Chunk(61bad0ee: 0%, 0/16777216)
> Chunk(s) at 50~100%:
> Chunk(2d79a032: 0%, 0/16777216)
> Chunk(42415f4e: 0%, 0/16777216)
> Chunk(33a3bade: 0%, 0/16777216)
> Chunk(1ce7ca63: 0%, 0/16777216)
> Chunk(531e1888: 0%, 0/16777216)
> Chunk(54786a09: 0%, 0/16777216)
> Chunk(5cdcb359: 0%, 0/16777216)
> Chunk(3e40137b: 0%, 0/16777216)
> Chunk(534f0fb3: 0%, 0/16777216)
> Chunk(6301ee8a: 0%, 0/16777216)
> Chunk(6a90c3aa: 0%, 0/16777216)
> Chunk(s) at 75~100%:
> none
> Chunk(s) at 100%:
> none
> ::snip::
> [4] Enable the memory logger by enabling trace level debugging for the 
> "drill.allocator" logger like this:
>   <logger name="drill.allocator" additivity="false">
>       <level value="trace" />
>       <appender-ref ref="FILE" />
>       <appender-ref ref="SOCKET" />
>  </logger>
> [5] On perf cluster
> # sqllineTPCDS
> ALTER SESSION SET `exec.errors.verbose` = true;
> ALTER SESSION SET `planner.enable_multiphase_agg` = false;
> ALTER SESSION SET `store.parquet.block-size` = 134217728;
> ALTER SESSION SET `planner.enable_mux_exchange` = false;
> ALTER SESSION SET `exec.min_hash_table_size` = 67108864;
> ALTER SESSION SET `planner.enable_hashagg` = true;
> ALTER SESSION SET `planner.memory.max_query_memory_per_node` = 29205777612;
> ALTER SESSION SET `planner.width.max_per_node` = 23;
> create table dfs.tmp.agg33 as
> select ss_sold_date_sk , ss_sold_time_sk , ss_item_sk , ss_customer_sk , 
> ss_cdemo_sk, count(*) from `store_sales_dri30000`
>  group by ss_sold_date_sk , ss_sold_time_sk , ss_item_sk , ss_customer_sk , 
> ss_cdemo_sk;
> [6] 
> https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L98



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to