[ https://issues.apache.org/jira/browse/DRILL-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reassigned DRILL-5774: ---------------------------------- Assignee: (was: Paul Rogers) > Excessive memory allocation > --------------------------- > > Key: DRILL-5774 > URL: https://issues.apache.org/jira/browse/DRILL-5774 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators > Affects Versions: 1.11.0 > Reporter: Robert Hou > Priority: Major > > This query exhibits excessive memory allocation: > {noformat} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.disable_exchanges` = true; > alter session set `planner.width.max_per_query` = 1; > select count(*) from (select * from (select id, flatten(str_list) str from > dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by > d.str) d1 where d1.id=0; > {noformat} > This query does a flatten on a large table. The result is 160M records. > Half the records have a one-byte string, and half have a 253-byte string. > And then there are 40K records with 223 byte strings. > {noformat} > select length(str), count(*) from (select id, flatten(str_list) str from > dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) group by > length(str); > +---------+-----------+ > | EXPR$0 | EXPR$1 | > +---------+-----------+ > | 223 | 40000 | > | 1 | 80042001 | > | 253 | 80000000 | > {noformat} > From the drillbit.log: > {noformat} > 2017-09-02 11:43:44,598 [26550427-6adf-a52e-2ea8-dc52d8d8433f:frag:0:0] DEBUG > o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes { > str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, > data size: 548360) > id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data > size: 36864) > Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width: > 262163, Net row width: 143, Density: 1} > {noformat} > The data size is 585K, but the batch size is 1 GB. The density is 1%. -- This message was sent by Atlassian JIRA (v7.6.3#76005)