Robert Hou created DRILL-5774:
---------------------------------
Summary: Excessive memory allocation
Key: DRILL-5774
URL: https://issues.apache.org/jira/browse/DRILL-5774
Project: Apache Drill
Issue Type: Bug
Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Paul Rogers
Fix For: 1.12.0
This query exhibits excessive memory allocation:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
select count(*) from (select * from (select id, flatten(str_list) str from
dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by
d.str) d1 where d1.id=0;
{noformat}
This query does a flatten on a large table. The result is 160M records. Half
the records have a one-byte string, and half have a 253-byte string. And then
there are 40K records with 223 byte strings.
{noformat}
select length(str), count(*) from (select id, flatten(str_list) str from
dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) group by
length(str);
+---------+-----------+
| EXPR$0 | EXPR$1 |
+---------+-----------+
| 223 | 40000 |
| 1 | 80042001 |
| 253 | 80000000 |
{noformat}
>From the drillbit.log:
{noformat}
2017-09-02 11:43:44,598 [26550427-6adf-a52e-2ea8-dc52d8d8433f:frag:0:0] DEBUG
o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes {
str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, data
size: 548360)
id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data
size: 36864)
Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width:
262163, Net row width: 143, Density: 1}
{noformat}
The data size is 585K, but the batch size is 1 GB. The density is 1%.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)