Rahul Challapalli created DRILL-5502:
----------------------------------------
Summary: Parallelized external sort is slower compared to the
single fragment scenario on some data sets
Key: DRILL-5502
URL: https://issues.apache.org/jira/browse/DRILL-5502
Project: Apache Drill
Issue Type: Bug
Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers
git.commit.id.abbrev=1e0a14c
The below query runs in a single fragment and completes in ~13 minutes
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 62600000;
alter session set `planner.width.max_per_query` = 17;
select count(*) from (select * from
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by
columns[0]) d where d.columns[0] = '4041054511';
+---------+
| EXPR$0 |
+---------+
| 0 |
+---------+
1 row selected (832.705 seconds)
{code}
Now I increased the parallelization to 10 and also increased the memory
allocated to the sort by 10 times, so that each individual fragments still ends
up getting the similar amount of memory. In this case however the query takes
~30 minutes to complete which is strange
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 10;
alter session set `planner.memory.max_query_memory_per_node` = 626000000;
alter session set `planner.width.max_per_query` = 17;
select count(*) from (select * from
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by
columns[0]) d where d.columns[0] = '4041054511';
+---------+
| EXPR$0 |
+---------+
| 0 |
+---------+
1 row selected (1845.508 seconds)
{code}
My data set contains wide columns (5k chars wide). I will try to reproduce this
with a data set where the column width is < 256 bytes.
Attached the data profile and log file from both the scenarios. The data set is
too large to attach to a jira
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)