[ https://issues.apache.org/jira/browse/TEZ-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443399#comment-16443399 ]
Harish Jaiprakash commented on TEZ-3915: ---------------------------------------- [~jeagles], the files are sequence file logged into date partitioned directories. These can be setup as external table in hive to run any analysis. A hive serde to required read from these, but we do not want any hive dependencies. I'll see where this can be added. The reader and writers are completely contained in tez and anyone can use these to read and analyse the events. W.r.t current loggers, ATS has performance issues in read side and SimpleHistoryLogger is not partitioned and being a json format has lot of overhead. I initially looked at using the recovery protos, but those are designed specifically for recovery and do not have fields like config, counters, diagnostics, dagPlan, ... in them. Even if these are modified, a different converter has to be written to keep recovery protos light. > Create protobuf based history event logger. > ------------------------------------------- > > Key: TEZ-3915 > URL: https://issues.apache.org/jira/browse/TEZ-3915 > Project: Apache Tez > Issue Type: Improvement > Reporter: Harish Jaiprakash > Assignee: Harish Jaiprakash > Priority: Major > Fix For: 0.9.next > > Attachments: TEZ-3915.01.patch, TEZ-3915.02.patch, TEZ-3915.03.patch, > TEZ-3915.04.patch, TEZ-3915.05.patch, TEZ-3915.06.patch > > > A protobuf based history event logger, to log directly into hdfs. Implement a > reader api also, to get the events from the files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)