[ 
https://issues.apache.org/jira/browse/TEZ-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443399#comment-16443399
 ] 

Harish Jaiprakash commented on TEZ-3915:
----------------------------------------

[~jeagles], the files are sequence file logged into date partitioned 
directories. These can be setup as external table in hive to run any analysis. 
A hive serde to required read from these, but we do not want any hive 
dependencies. I'll see where this can be added. The reader and writers are 
completely contained in tez and anyone can use these to read and analyse the 
events.

 

W.r.t current loggers, ATS has performance issues in read side and 
SimpleHistoryLogger is not partitioned and being a json format has lot of 
overhead.

 

I initially looked at using the recovery protos, but those are designed 
specifically for recovery and do not have fields like config, counters, 
diagnostics, dagPlan, ... in them. Even if these are modified, a different 
converter has to be written to keep recovery protos light.

> Create protobuf based history event logger.
> -------------------------------------------
>
>                 Key: TEZ-3915
>                 URL: https://issues.apache.org/jira/browse/TEZ-3915
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Harish Jaiprakash
>            Assignee: Harish Jaiprakash
>            Priority: Major
>             Fix For: 0.9.next
>
>         Attachments: TEZ-3915.01.patch, TEZ-3915.02.patch, TEZ-3915.03.patch, 
> TEZ-3915.04.patch, TEZ-3915.05.patch, TEZ-3915.06.patch
>
>
> A protobuf based history event logger, to log directly into hdfs. Implement a 
> reader api also, to get the events from the files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to