Ádám Szita created HIVE-24266:
---------------------------------
Summary: Committed rows in hflush'd ACID files may be missing from
query result
Key: HIVE-24266
URL: https://issues.apache.org/jira/browse/HIVE-24266
Project: Hive
Issue Type: Bug
Reporter: Ádám Szita
Assignee: Ádám Szita
in HDFS environment if a writer is using hflush to write ORC ACID files during
a transaction commit, the results might be seen as missing when reading the
table before this file is completely persisted to disk (thus synced)
This is due to hflush not persisting the new buffers to disk, it rather just
ensures that new readers can see the new content. This causes the block
information to be incomplete, on which BISplitStrategy relies on. Although the
side file (_flush_length) tracks the proper end of the file that is being
written, this information is neglected in the favour of block information, and
we may end up generating a very short split instead of the larger, available
length.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)