[jira] [Work logged] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

ASF GitHub Bot (Jira) Wed, 14 Oct 2020 02:30:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-24266?focusedWorklogId=500547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500547
 ]


ASF GitHub Bot logged work on HIVE-24266:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Oct/20 09:29
            Start Date: 14/Oct/20 09:29
    Worklog Time Spent: 10m 
      Work Description: pvary commented on a change in pull request #1576:
URL: https://github.com/apache/hive/pull/1576#discussion_r504534912



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##########
@@ -1156,13 +1157,36 @@ public BISplitStrategy(Context context, FileSystem fs, 
Path dir,
           } else {
             TreeMap<Long, BlockLocation> blockOffsets = 
SHIMS.getLocationsWithOffset(fs, fileStatus);
             for (Map.Entry<Long, BlockLocation> entry : 
blockOffsets.entrySet()) {
-              if (entry.getKey() + entry.getValue().getLength() > logicalLen) {
+              long blockOffset = entry.getKey();
+              long blockLength = entry.getValue().getLength();
+              if(blockOffset > logicalLen) {
                 //don't create splits for anything past logical EOF
-                continue;
+                //map is ordered, thus any possible entry in the iteration 
after this is bound to be > logicalLen
+                break;
               }
-              OrcSplit orcSplit = new OrcSplit(fileStatus.getPath(), fileKey, 
entry.getKey(),
-                entry.getValue().getLength(), entry.getValue().getHosts(), 
null, isOriginal, true,
-                deltas, -1, logicalLen, dir, offsetAndBucket);
+              long splitLength = blockLength;
+
+              long blockEndOvershoot = (blockOffset + blockLength) - 
logicalLen;
+              if (blockEndOvershoot > 0) {
+                // if logicalLen is placed within a block, we should make 
(this last) split out of the part of this block
+                // -> we should read less than block end
+                splitLength -= blockEndOvershoot;
+              } else if (blockOffsets.lastKey() == blockOffset && 
blockEndOvershoot < 0) {
+                // This is the last block but it ends before logicalLen
+                // This can happen with HDFS if hflush was called and blocks 
are not persisted to disk yet, but content
+                // is otherwise available for readers, as DNs have these 
buffers in memory at this time.
+                // -> we should read more than (persisted) block end, but 
surely not more than the whole block
+                if (fileStatus instanceof HdfsLocatedFileStatus) {
+                  HdfsLocatedFileStatus hdfsFileStatus = 
(HdfsLocatedFileStatus)fileStatus;
+                  if (hdfsFileStatus.getLocatedBlocks().isUnderConstruction()) 
{
+                    // blockEndOvershoot is negative here...
+                    splitLength = Math.min(splitLength - blockEndOvershoot, 
hdfsFileStatus.getBlockSize());

Review comment:
       Maybe this is just a theoretical problem, but if the `blockOffset + 
hdfsFileStatus.getBlockSize()` is greater in the last block than the 
`logicalLen`, then we should throw an exception, and then we do not need the 
min here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 500547)
    Time Spent: 50m  (was: 40m)

> Committed rows in hflush'd ACID files may be missing from query result
> ----------------------------------------------------------------------
>
>                 Key: HIVE-24266
>                 URL: https://issues.apache.org/jira/browse/HIVE-24266
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> in HDFS environment if a writer is using hflush to write ORC ACID files 
> during a transaction commit, the results might be seen as missing when 
> reading the table before this file is completely persisted to disk (thus 
> synced)
> This is due to hflush not persisting the new buffers to disk, it rather just 
> ensures that new readers can see the new content. This causes the block 
> information to be incomplete, on which BISplitStrategy relies on. Although 
> the side file (_flush_length) tracks the proper end of the file that is being 
> written, this information is neglected in the favour of block information, 
> and we may end up generating a very short split instead of the larger, 
> available length.
> When ETLSplitStrategy is used there is not even a try to rely on ACID side 
> file when calculating file length, so that needs to fixed too.
> Moreover we might see the newly committed rows not to appear due to OrcTail 
> caching in ETLSplitStrategy. For now I'm just going to recommend turning that 
> cache off to anyone that wants real time row updates to be read in:
> {code:java}
> set hive.orc.cache.stripe.details.mem.size=0;  {code}
> ..as tweaking with that code would probably open a can of worms..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

Reply via email to