[ 
https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083111#comment-16083111
 ] 

Siddharth Seth commented on HIVE-17019:
---------------------------------------

Thanks for posting the patch. Will be useful to get relevant data for a query.
- Change the top level package from llap-debug to tez-debug? (Works with both I 
believe) [~ashutoshc], [~thejas] - any recommendations on whether the code gets 
a top level module, or goes under an existing module. This allows downloading 
of various debug artifacts for a tez job - logs, metrics for llap, hiveserver2 
logs (soon), tez am logs, ATS data for the query (hive and tez).
- In the new pom.xml, dependency on hive-llap-server. 1) Is it required?, 2) 
Will need to exclude some dependent artifacts. See service/pom.xml llap-server 
dependency handling
- LogDownloadServlet - Should this throw an error as soon as the filename 
pattern validation fails?
- LogDownloadServlet - change to dagId/queryId validation instead
- LogDownloadServlet - thread being created inside of the request handler? This 
should be limited outside of the request? so that only a controlled number of 
parallel artifact downloads can run.
- LogDownloadServlet - what happens in case of aggregator failure? Exception 
back to the user?
- LogDownloadServlet - seems to be generating the file to disk and then 
streaming it over. Can this be streamed over directly instead. Otherwise 
there's the possibility of leaking files. (Artifact.downloadIntoStream or some 
such?) Guessing this is complicated further by the multi-threaded artifact 
downloader.
Alternately need to have a cleanup mechanism. 
- Timeout on the tests
- Apache header needs to be added to files where it is missing.
- Main - Please rename to something more indicative of what the tool does.
- Main - Likely a follow up jira - parse using a standard library, instead of 
trying to parse the arguments to main directly.
- Server - Enabling the artifact should be controlled via a config. Does not 
always need to be hosted in HS2 (Default disabled, at least till security can 
be sorted out)
- Is it possible to support a timeout on the downloads? (Can be a follow up 
jira)
- ArtifactAggregator - I believe this does 2 stages of dependent artifacts / 
downloads? Stage1 - download whatever it can. Information from this should 
should be adequate for stage2 downloads ?
- For the ones not implemented yet (DummyArtifact) - think it's better to just 
comment out the code, instead of invoking the DummyArtifacts downloader
- Security - ACL enforcement required on secure clusters to make sure users can 
only download what they have access to. This is a must fix before this can be 
enabled by default.
- Security - this can work around yarn restrictions on log downloads, since the 
files are being accessed by the hive user.
Could you please add some details on cluster testing.

> Add support to download debugging information as an archive.
> ------------------------------------------------------------
>
>                 Key: HIVE-17019
>                 URL: https://issues.apache.org/jira/browse/HIVE-17019
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Harish Jaiprakash
>            Assignee: Harish Jaiprakash
>         Attachments: HIVE-17019.01.patch
>
>
> Given a queryId or dagId, get all information related to it: like, tez am, 
> task logs, hive ats data, tez ats data, slider am status, etc. Package it 
> into and archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to