[ https://issues.apache.org/jira/browse/OOZIE-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807533#comment-16807533 ]
Andras Piros commented on OOZIE-3249: ------------------------------------- Thanks for the review [~asalamon74]! Uploaded patch 002 that addresses your comments. > [tools] Instrumentation log parser > ---------------------------------- > > Key: OOZIE-3249 > URL: https://issues.apache.org/jira/browse/OOZIE-3249 > Project: Oozie > Issue Type: Improvement > Components: tools > Affects Versions: 5.0.0 > Reporter: Andras Piros > Assignee: Andras Piros > Priority: Major > Attachments: OOZIE-3249.001.patch, OOZIE-3249.002.patch, > oozie-instrumentation-localhost.log.2018-05-09, > oozie-instrumentation-localhost.log.2018-05-09.out > > > Oozie instrumentation logs contain a lot of information, but are difficult to > parse, because per instrumentation log entry there is always one header line > in plain text format (containing timestamp), and multiple other lines in JSON > format (not containing timestamp). Those lines of course belong together. > {noformat} > 2018-05-02 02:48:13,426 INFO oozieinstrumentation:520 - USER[-] GROUP[-] > TOKEN[-] APP[-] JOB[-] ACTION[-] > { > ... > "counters" : { > ... > "callablequeue.executed" : { > "count" : 5954144 > }, > ... > "callablequeue.queued" : { > "count" : 10596129 > }, > ... > }, > ... > } > {noformat} > There should be a simple script in {{tools/bin}} that takes as parameters: > * input file name ({{-i}}), e.g. {{-i /path/to/oozie-instrumentation.log}} > * output file name ({{-o}}), e.g. {{-o > /path/to/oozie-instrumentation.log.out}} > * parameters to extract ({{-p}}) in the format of > {{path/to/json/value1,path/to/json/value2}}, in this case {{-p > counters/callablequeue.executed/count,counters/callablequeue.queued/count}} > The output file should contain in CSV format: > * a header line containing column names for > * one line per parsed input header / JSON lines, containing: > ** first cell is the minutes part of the timestamp > ** consecutive cells are parsed JSON values given each parameter to extract -- This message was sent by Atlassian JIRA (v7.6.3#76005)