Sergey Shelukhin created HIVE-11265:
---------------------------------------
Summary: LLAP: investigate locality issues
Key: HIVE-11265
URL: https://issues.apache.org/jira/browse/HIVE-11265
Project: Hive
Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth
Running q27 with split-waves 0.9 on 10 nodes x 16 executors, I get 140 mappers
reading store_sales, and 5~ more assorted vertices.
When running the query repeatedly, one would expect good locality, i.e. the
same stripes being processed on the same nodes most of the time.
However, this is only the case for 40-50% of the stripes in my experience. When
the query is run 10 times in a row, an average split (file+stripe) is read on
~4 machine. Some are actually read on a different machine every run :)
This affects cache hit ratio.
Understandably in real scenarios we won't get 100% locality, but we should not
be getting bad locality in simple cases like this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)