[ https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajesh Balamohan updated TEZ-2076: ---------------------------------- Attachment: TEZ-2076.11.patch >> usage: java -cp >> $HADOOP_CONF_DIR/:./target/tez-history-parser-x.y.z-SNAPSHOT-jar-with-dependencies.jar - Removed references to snapshot. - It generates both thin/thick jars. User can run "HADOOP_CLASSPATH=$TEZ_HOME/*:$TEZ_HOME/lib/*:$HADOOP_CLASSPATH hadoop jar ./target/tez-history-parser-x.y.z.jar org.apache.tez.history.ATSImportTool" or choose to use "java -cp $HADOOP_CONF_DIR/:./target/tez-history-parser-x.y.z-jar-with-dependencies.jar". Changed javadoc to reflect this. >> Code is not using slf4j. Also, where does log4j.properties come from at >> runtime? - Using slf4j in latest patch. Added tez-history-log4j.properties which writes to tez-history-parser.log. User can override this in runtime using standard -Dlog4j.configuration=file:/location >> Secure cluster support? If not supported, throw an error. - No special code needs to be added for secure cluster. JDK 7 url connection automatically takes care of it, if a valid ticket is present. Tested it on secure/non-secure cluster. Haven't checked with JDK 6, but that might not be a requirement now to support JDK 6 with this tool. >> tez-history-parser/pom.xml needs a mvn skip deploy to ensure that the jar >> does not get deployed to the mvn repo. - Fixed. >> check both not null and not empty. - Fixed, and checking for valid dag id. >> please add more logging to clarify in which dir the output is being >> generated to. - Fixed. Cleaned up directory creation as well. >> batchSize hidden config? - Fixed. Added it as an optional parameter. >> does this make any assumptions on preceding - Fixed. >> "private int download()" - any reason to return an int return value instead >> of a void? - Fixed. Returns void. >> "UTF-8" - final static field? - Fixed. >> " LOG.info("Limit=" + limit + ", downloaded " + "entities len=" + >> entities.length());" - this seems like a debug log. - Fixed >> might need to check whether a 404 is a valid response if no data exists or >> if we need a better error message for dag not found if the dag entity >> returns a 404. - Printing the entire error contents instead. In secure cluster, sometimes it returned 404 message with a HTML page. Instead of parsing it, error message prints the status code along with the error stream contents for easier debugging. >> collapse this to a single Exception catch? - Fixed >> Constants.java. any reason for the duplication? - It extends ATSConstants and adds certain fields which are not present in ATSConstants. > Tez framework to extract/analyze data stored in ATS for specific dag > -------------------------------------------------------------------- > > Key: TEZ-2076 > URL: https://issues.apache.org/jira/browse/TEZ-2076 > Project: Apache Tez > Issue Type: Improvement > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Attachments: TEZ-2076.1.patch, TEZ-2076.10.patch, TEZ-2076.11.patch, > TEZ-2076.2.patch, TEZ-2076.3.patch, TEZ-2076.4.patch, TEZ-2076.5.patch, > TEZ-2076.6.patch, TEZ-2076.7.patch, TEZ-2076.8.patch, TEZ-2076.9.patch, > TEZ-2076.WIP.2.patch, TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch > > > - Users should be able to download ATS data pertaining to a DAG from Tez-UI > (more like a zip file containing DAG/Vertex/Task/TaskAttempt info). > - This can be plugged to an analyzer which parses the data, adds semantics > and provides an in-memory representation for further analysis. > - This will enable to write different analyzer rules, which can be run on top > of this in-memory representation to come up with analysis on the DAG. > - Results of this analyzer rules can be rendered on to UI (standalone webapp) > later point in time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)