[jira] [Commented] (HIVE-11030) Enhance storage layer to create one delta file per write

Alan Gates (JIRA) Mon, 29 Jun 2015 13:10:20 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606294#comment-14606294
 ]


Alan Gates commented on HIVE-11030:
-----------------------------------

AcidUtils.serializeDeltas and AcidUtils.deserializeDeltas:  You changed these 
to work in the framework of deltas being passed as a list of longs.  But this 
causes double stating of the file system because now 
OrcInputFormat.FileGenerator calls AcidUtils.serializeDeltas, has to figure out 
all the deltas and then forget about the statementIds, then when it comes back 
around in OrcInputFormat.getReader and calls AcidUtils.deserializeDeltas it has 
to go back and restat the file system to find all the statement ids.  Instead 
you should change de/serializeDeltas to pass a triple (maxtxn, mintxn, stmt).  
Or if you prefer to extend the existing hack it can pass a list of longs but 
use 3 slots per delta instead of 2.  This avoids loss of info in serialize that 
has to be rediscovered in deserialize.

In AcidUtils:
{code}
private static ParsedDelta parseDelta(FileStatus path) {
            ParsedDelta p = parsedDelta(path.getPath());
            return new ParsedDelta(p.getMinTransaction(), 
p.getMaxTransaction(), path, p.statementId);
}
{code}
I don't understand this code.  Why get a ParsedDelta and turn around and create 
a new one?

In parseDelta, would it be better to split the string on '_' rather than call 
indexOf twice?

In OrcRawRecordMerger, in the constructor (line 489 in your patch) you added a 
call to AcidUtils.parsedDeltas.  This looks like another case where if the 
statement id was being properly preserved we would not need to again parse the 
file name.

OrcRecordUpdate, end of the constructor (line 265 in your patch), you're 
introducing a file system stat for a sanity check.  That doesn't seem worth it.



> Enhance storage layer to create one delta file per write
> --------------------------------------------------------
>
>                 Key: HIVE-11030
>                 URL: https://issues.apache.org/jira/browse/HIVE-11030
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>    Affects Versions: 1.2.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>         Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch
>
>
> Currently each txn using ACID insert/update/delete will generate a delta 
> directory like delta_0000100_0000101.  In order to support multi-statement 
> transactions we must generate one delta per operation within the transaction 
> so the deltas would be named like delta_0000100_0000101_0001, etc.
> Support for MERGE (HIVE-10924) would need the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11030) Enhance storage layer to create one delta file per write

Reply via email to