Sergey Shelukhin created HIVE-17423:
---------------------------------------
Summary: LLAP Parquet caching - support file ID in splits
Key: HIVE-17423
URL: https://issues.apache.org/jira/browse/HIVE-17423
Project: Hive
Issue Type: Bug
Reporter: Sergey Shelukhin
To get LLAP cache data one needs a file ID which is either an HDFS inode ID, or
a composite of path, modification time and size. These can be embedded into
splits for ORC, cause in particular for the former it's possible to get the IDs
as a part of a normal file enumeration that split generation performs anyway.
If they are missing, the IDs need to be obtained for every file on the fragment
side.
We should explore adding file IDs to Parquet splits when the cache is enabled.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)