[
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-2076:
----------------------------------
Attachment: TEZ-2076.11.patch
>> usage: java -cp
>> $HADOOP_CONF_DIR/:./target/tez-history-parser-x.y.z-SNAPSHOT-jar-with-dependencies.jar
- Removed references to snapshot.
- It generates both thin/thick jars. User can run
"HADOOP_CLASSPATH=$TEZ_HOME/*:$TEZ_HOME/lib/*:$HADOOP_CLASSPATH hadoop jar
./target/tez-history-parser-x.y.z.jar org.apache.tez.history.ATSImportTool" or
choose to use "java -cp
$HADOOP_CONF_DIR/:./target/tez-history-parser-x.y.z-jar-with-dependencies.jar".
Changed javadoc to reflect this.
>> Code is not using slf4j. Also, where does log4j.properties come from at
>> runtime?
- Using slf4j in latest patch. Added tez-history-log4j.properties which writes
to tez-history-parser.log. User can override this in runtime using standard
-Dlog4j.configuration=file:/location
>> Secure cluster support? If not supported, throw an error.
- No special code needs to be added for secure cluster. JDK 7 url connection
automatically takes care of it, if a valid ticket is present. Tested it on
secure/non-secure cluster. Haven't checked with JDK 6, but that might not be a
requirement now to support JDK 6 with this tool.
>> tez-history-parser/pom.xml needs a mvn skip deploy to ensure that the jar
>> does not get deployed to the mvn repo.
- Fixed.
>> check both not null and not empty.
- Fixed, and checking for valid dag id.
>> please add more logging to clarify in which dir the output is being
>> generated to.
- Fixed. Cleaned up directory creation as well.
>> batchSize hidden config?
- Fixed. Added it as an optional parameter.
>> does this make any assumptions on preceding
- Fixed.
>> "private int download()" - any reason to return an int return value instead
>> of a void?
- Fixed. Returns void.
>> "UTF-8" - final static field?
- Fixed.
>> " LOG.info("Limit=" + limit + ", downloaded " + "entities len=" +
>> entities.length());" - this seems like a debug log.
- Fixed
>> might need to check whether a 404 is a valid response if no data exists or
>> if we need a better error message for dag not found if the dag entity
>> returns a 404.
- Printing the entire error contents instead. In secure cluster, sometimes it
returned 404 message with a HTML page. Instead of parsing it, error message
prints the status code along with the error stream contents for easier
debugging.
>> collapse this to a single Exception catch?
- Fixed
>> Constants.java. any reason for the duplication?
- It extends ATSConstants and adds certain fields which are not present in
ATSConstants.
> Tez framework to extract/analyze data stored in ATS for specific dag
> --------------------------------------------------------------------
>
> Key: TEZ-2076
> URL: https://issues.apache.org/jira/browse/TEZ-2076
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2076.1.patch, TEZ-2076.10.patch, TEZ-2076.11.patch,
> TEZ-2076.2.patch, TEZ-2076.3.patch, TEZ-2076.4.patch, TEZ-2076.5.patch,
> TEZ-2076.6.patch, TEZ-2076.7.patch, TEZ-2076.8.patch, TEZ-2076.9.patch,
> TEZ-2076.WIP.2.patch, TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch
>
>
> - Users should be able to download ATS data pertaining to a DAG from Tez-UI
> (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
> - This can be plugged to an analyzer which parses the data, adds semantics
> and provides an in-memory representation for further analysis.
> - This will enable to write different analyzer rules, which can be run on top
> of this in-memory representation to come up with analysis on the DAG.
> - Results of this analyzer rules can be rendered on to UI (standalone webapp)
> later point in time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)