[ 
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2076:
----------------------------------
    Attachment: TEZ-2076.11.patch



>> usage: java -cp 
>> $HADOOP_CONF_DIR/:./target/tez-history-parser-x.y.z-SNAPSHOT-jar-with-dependencies.jar
- Removed references to snapshot.
- It generates both thin/thick jars.  User can run 
"HADOOP_CLASSPATH=$TEZ_HOME/*:$TEZ_HOME/lib/*:$HADOOP_CLASSPATH hadoop jar 
./target/tez-history-parser-x.y.z.jar org.apache.tez.history.ATSImportTool" or 
choose to use "java -cp 
$HADOOP_CONF_DIR/:./target/tez-history-parser-x.y.z-jar-with-dependencies.jar". 
 Changed javadoc to reflect this.

>> Code is not using slf4j. Also, where does log4j.properties come from at 
>> runtime?
- Using slf4j in latest patch. Added tez-history-log4j.properties which writes 
to tez-history-parser.log. User can override this in runtime using standard 
-Dlog4j.configuration=file:/location

>> Secure cluster support? If not supported, throw an error.
- No special code needs to be added for secure cluster.  JDK 7 url connection 
automatically takes care of it, if a valid ticket is present.  Tested it on 
secure/non-secure cluster.  Haven't checked with JDK 6, but that might not be a 
requirement now to support JDK 6 with this tool.

>> tez-history-parser/pom.xml needs a mvn skip deploy to ensure that the jar 
>> does not get deployed to the mvn repo.
- Fixed.

>> check both not null and not empty.
- Fixed, and checking for valid dag id.

>> please add more logging to clarify in which dir the output is being 
>> generated to. 
- Fixed. Cleaned up directory creation as well.

>> batchSize hidden config?
- Fixed. Added it as an optional parameter.

>> does this make any assumptions on preceding
- Fixed.

>> "private int download()" - any reason to return an int return value instead 
>> of a void? 
- Fixed. Returns void.

>> "UTF-8" - final static field?
- Fixed.

>> " LOG.info("Limit=" + limit + ", downloaded " + "entities len=" + 
>> entities.length());" - this seems like a debug log.
- Fixed

>> might need to check whether a 404 is a valid response if no data exists or 
>> if we need a better error message for dag not found if the dag entity 
>> returns a 404.
- Printing the entire error contents instead.  In secure cluster, sometimes it 
returned 404 message with a HTML page.  Instead of parsing it, error message 
prints the status code along with the error stream contents for easier 
debugging.

>> collapse this to a single Exception catch? 
- Fixed

>> Constants.java. any reason for the duplication?
- It extends ATSConstants and adds certain fields which are not present in 
ATSConstants.

> Tez framework to extract/analyze data stored in ATS for specific dag
> --------------------------------------------------------------------
>
>                 Key: TEZ-2076
>                 URL: https://issues.apache.org/jira/browse/TEZ-2076
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2076.1.patch, TEZ-2076.10.patch, TEZ-2076.11.patch, 
> TEZ-2076.2.patch, TEZ-2076.3.patch, TEZ-2076.4.patch, TEZ-2076.5.patch, 
> TEZ-2076.6.patch, TEZ-2076.7.patch, TEZ-2076.8.patch, TEZ-2076.9.patch, 
> TEZ-2076.WIP.2.patch, TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch
>
>
> - Users should be able to download ATS data pertaining to a DAG from Tez-UI 
> (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
> - This can be plugged to an analyzer which parses the data, adds semantics 
> and provides an in-memory representation for further analysis.
> - This will enable to write different analyzer rules, which can be run on top 
> of this in-memory representation to come up with analysis on the DAG.
> - Results of this analyzer rules can be rendered on to UI (standalone webapp) 
> later point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to