Rajesh Balamohan created HIVE-17174: ---------------------------------------
Summary: LLAP: ShuffleHandler: optimize fadvise calls for broadcast edge Key: HIVE-17174 URL: https://issues.apache.org/jira/browse/HIVE-17174 Project: Hive Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Priority: Minor Currently, once the data is transferred `fadvise` call is invoked to throw away the pages. This may not be very helpful in broadcast, as it would tend to transfer the same data to multiple downstream tasks. e.g Q50 at 1 TB scale {noformat} Edges: Map 1 <- Map 5 (BROADCAST_EDGE) Map 6 <- Reducer 2 (BROADCAST_EDGE), Reducer 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE) Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE) Reducer 4 <- Map 1 (CUSTOM_SIMPLE_EDGE) Reducer 7 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 6 (CUSTOM_SIMPLE_EDGE) Reducer 8 <- Reducer 7 (SIMPLE_EDGE) Reducer 9 <- Reducer 8 (SIMPLE_EDGE) Status: Running (Executing on YARN cluster with App id application_1490656001509_6084) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 5 .......... llap SUCCEEDED 1 1 0 0 0 0 Map 1 .......... llap SUCCEEDED 11 11 0 0 0 0 Reducer 4 ...... llap SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... llap SUCCEEDED 1 1 0 0 0 0 Reducer 3 ...... llap SUCCEEDED 1 1 0 0 0 0 Map 6 .......... llap SUCCEEDED 139 139 0 0 0 0 Map 10 ......... llap SUCCEEDED 1 1 0 0 0 0 Map 11 ......... llap SUCCEEDED 1 1 0 0 0 0 Reducer 7 ...... llap SUCCEEDED 834 834 0 0 0 0 Reducer 8 ...... llap SUCCEEDED 24 24 0 0 0 0 Reducer 9 ...... llap SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- e.g count of evictions on files 139 /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_05_000000_0_18387/file.out 834 /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_07_000000_0_18420_1/file.out 834 /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_07_000000_0_18420_2/file.out {noformat} It would be good to fadvise for cases when "partition != 0". This would help retaining the pages for broadcast. -- This message was sent by Atlassian JIRA (v6.4.14#64029)