[jira] [Updated] (HIVE-20935) Upload of llap package tarball fails in EC2 causing LLAP service start failure
[ https://issues.apache.org/jira/browse/HIVE-20935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-20935: -- Resolution: Fixed Fix Version/s: 4.0.0 Status: Resolved (was: Patch Available) Committed to master. Thanks [~gsaha], and [~prasanth_j] for the review. Does this need to go into other branches as well? > Upload of llap package tarball fails in EC2 causing LLAP service start failure > -- > > Key: HIVE-20935 > URL: https://issues.apache.org/jira/browse/HIVE-20935 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-20935.01.patch, HIVE-20935.02.patch, > HIVE-20935.03.patch > > > Even though package dir is defined as below (with a / at the end) - > {code} > LLAP_PACKAGE_DIR = ".yarn/package/LLAP/"; > {code} > copyLocalFileToHdfs API fails to create the dir hierarchy of > .yarn/package/LLAP/ first and then copy the file under it. It instead uploads > the file under .yarn/package with name LLAP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20935) Upload of llap package tarball fails in EC2 causing LLAP service start failure
[ https://issues.apache.org/jira/browse/HIVE-20935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-20935: -- Attachment: HIVE-20935.03.patch > Upload of llap package tarball fails in EC2 causing LLAP service start failure > -- > > Key: HIVE-20935 > URL: https://issues.apache.org/jira/browse/HIVE-20935 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > Attachments: HIVE-20935.01.patch, HIVE-20935.02.patch, > HIVE-20935.03.patch > > > Even though package dir is defined as below (with a / at the end) - > {code} > LLAP_PACKAGE_DIR = ".yarn/package/LLAP/"; > {code} > copyLocalFileToHdfs API fails to create the dir hierarchy of > .yarn/package/LLAP/ first and then copy the file under it. It instead uploads > the file under .yarn/package with name LLAP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20935) Upload of llap package tarball fails in EC2 causing LLAP service start failure
[ https://issues.apache.org/jira/browse/HIVE-20935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16692045#comment-16692045 ] Siddharth Seth commented on HIVE-20935: --- [~prasanth_j], [~jdere] - what is required to get this re-tested. The test failure isn't related to this change. Does the CI have to be green for the change to go in? > Upload of llap package tarball fails in EC2 causing LLAP service start failure > -- > > Key: HIVE-20935 > URL: https://issues.apache.org/jira/browse/HIVE-20935 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > Attachments: HIVE-20935.01.patch > > > Even though package dir is defined as below (with a / at the end) - > {code} > LLAP_PACKAGE_DIR = ".yarn/package/LLAP/"; > {code} > copyLocalFileToHdfs API fails to create the dir hierarchy of > .yarn/package/LLAP/ first and then copy the file under it. It instead uploads > the file under .yarn/package with name LLAP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20763) Add google cloud storage (gs) to the exim uri schema whitelist
[ https://issues.apache.org/jira/browse/HIVE-20763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-20763: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the reviews. Committed. > Add google cloud storage (gs) to the exim uri schema whitelist > -- > > Key: HIVE-20763 > URL: https://issues.apache.org/jira/browse/HIVE-20763 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-20763.01.patch > > > import/export is enabled for s3a by default. Ideally this list should include > other cloud storage options. This Jira adds Google Storage to the list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20763) Add google cloud storage (gs) to the exim uri schema whitelist
[ https://issues.apache.org/jira/browse/HIVE-20763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-20763: -- Status: Patch Available (was: Open) > Add google cloud storage (gs) to the exim uri schema whitelist > -- > > Key: HIVE-20763 > URL: https://issues.apache.org/jira/browse/HIVE-20763 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-20763.01.patch > > > import/export is enabled for s3a by default. Ideally this list should include > other cloud storage options. This Jira adds Google Storage to the list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20763) Add google cloud storage (gs) to the exim uri schema whitelist
[ https://issues.apache.org/jira/browse/HIVE-20763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-20763: -- Fix Version/s: 4.0.0 > Add google cloud storage (gs) to the exim uri schema whitelist > -- > > Key: HIVE-20763 > URL: https://issues.apache.org/jira/browse/HIVE-20763 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-20763.01.patch > > > import/export is enabled for s3a by default. Ideally this list should include > other cloud storage options. This Jira adds Google Storage to the list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20763) Add google cloud storage (gs) to the exim uri schema whitelist
[ https://issues.apache.org/jira/browse/HIVE-20763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654083#comment-16654083 ] Siddharth Seth commented on HIVE-20763: --- Uploaded a trivial patch. No explicit tests needed for this. [~sershe], [~prasanth_j] - could you please take a look. > Add google cloud storage (gs) to the exim uri schema whitelist > -- > > Key: HIVE-20763 > URL: https://issues.apache.org/jira/browse/HIVE-20763 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-20763.01.patch > > > import/export is enabled for s3a by default. Ideally this list should include > other cloud storage options. This Jira adds Google Storage to the list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-20763) Add google cloud storage (gs) to the exim uri schema whitelist
[ https://issues.apache.org/jira/browse/HIVE-20763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-20763: - > Add google cloud storage (gs) to the exim uri schema whitelist > -- > > Key: HIVE-20763 > URL: https://issues.apache.org/jira/browse/HIVE-20763 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Major > Attachments: HIVE-20763.01.patch > > > import/export is enabled for s3a by default. Ideally this list should include > other cloud storage options. This Jira adds Google Storage to the list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20763) Add google cloud storage (gs) to the exim uri schema whitelist
[ https://issues.apache.org/jira/browse/HIVE-20763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-20763: -- Attachment: HIVE-20763.01.patch > Add google cloud storage (gs) to the exim uri schema whitelist > -- > > Key: HIVE-20763 > URL: https://issues.apache.org/jira/browse/HIVE-20763 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Major > Attachments: HIVE-20763.01.patch > > > import/export is enabled for s3a by default. Ideally this list should include > other cloud storage options. This Jira adds Google Storage to the list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19222) TestNegativeCliDriver tests are failing due to "java.lang.OutOfMemoryError: GC overhead limit exceeded"
[ https://issues.apache.org/jira/browse/HIVE-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439824#comment-16439824 ] Siddharth Seth commented on HIVE-19222: --- [~aihuaxu] - the memory decrease would likely have been a result of running with smaller values, which should be good enough for tests. So - no specific insight into what may have caused the increase / OOM. > TestNegativeCliDriver tests are failing due to "java.lang.OutOfMemoryError: > GC overhead limit exceeded" > --- > > Key: HIVE-19222 > URL: https://issues.apache.org/jira/browse/HIVE-19222 > Project: Hive > Issue Type: Sub-task >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: HIVE-19222.1.patch > > > TestNegativeCliDriver tests are failing with OOM recently. Not sure why. I > will try to increase the memory to test out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19021) WM counters are not properly propagated from LLAP to AM
[ https://issues.apache.org/jira/browse/HIVE-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410057#comment-16410057 ] Siddharth Seth commented on HIVE-19021: --- Kind of rusty on some of this. Given Tez counters are only created when the task starts running, the approach looks ok to me. Didn't understand the comment about "Need to update earlier for runtimes". Getting counters before the RunningTask is created - looks like this will require bigger changes. There's a bunch of "null" checks all over the patch, which seem very unnecessary. Also checks to see if the TezCounters are already set, with a log message. Are these counters ever expected to be null? > WM counters are not properly propagated from LLAP to AM > --- > > Key: HIVE-19021 > URL: https://issues.apache.org/jira/browse/HIVE-19021 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HIVE-19021.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18326) LLAP Tez scheduler - only preempt tasks if there's a dependency between them
[ https://issues.apache.org/jira/browse/HIVE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317403#comment-16317403 ] Siddharth Seth commented on HIVE-18326: --- On DAG itself - I think this is the only place. That's a core component internal to Tez. I would be very careful about depending on this. I know there's other places where Tez internals are used - they're mostly from the runtime though. I think a Tez API specific change can be made. In fact TEZ-3770 will likely go in without API changes, which I had asked for there as well. > LLAP Tez scheduler - only preempt tasks if there's a dependency between them > > > Key: HIVE-18326 > URL: https://issues.apache.org/jira/browse/HIVE-18326 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-18326.01.patch, HIVE-18326.02.patch, > HIVE-18326.02.patch, HIVE-18326.patch > > > It is currently possible for e.g. two sides of a union (or a join for that > matter) to have slightly different priorities. We don't want to preempt > running tasks on one side in favor of the other side in such cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18326) LLAP Tez scheduler - only preempt tasks if there's a dependency between them
[ https://issues.apache.org/jira/browse/HIVE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316919#comment-16316919 ] Siddharth Seth commented on HIVE-18326: --- {code} +DAG dag = (DAG) info; {code} is a giant hack, and can break in the future if Tez implements DagInfo in a different way. May be better to modify Tez to expose the relevant information via DagInfo, rather than casting to DAG which is an internal structure to Tez. Relevant jira on the Tez side: https://issues.apache.org/jira/browse/TEZ-3770 > LLAP Tez scheduler - only preempt tasks if there's a dependency between them > > > Key: HIVE-18326 > URL: https://issues.apache.org/jira/browse/HIVE-18326 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-18326.01.patch, HIVE-18326.02.patch, > HIVE-18326.02.patch, HIVE-18326.patch > > > It is currently possible for e.g. two sides of a union (or a join for that > matter) to have slightly different priorities. We don't want to preempt > running tasks on one side in favor of the other side in such cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17431) change configuration handling in TezSessionState
[ https://issues.apache.org/jira/browse/HIVE-17431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208535#comment-16208535 ] Siddharth Seth commented on HIVE-17431: --- So. new sessions use new configs (and there's enough checks in place to reset these sessions / launch new ones if sufficient context changes), and when there's not sufficient context change, then local reources are added for the DAG? Makes sense > change configuration handling in TezSessionState > > > Key: HIVE-17431 > URL: https://issues.apache.org/jira/browse/HIVE-17431 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17431.patch > > > The configuration is only set when opening the session; that seems > unnecessary - it could be set in the ctor and made final. E.g. when updating > the session and localizing new resources we may theoretically open the > session with a new config, but we don't update the config and only update the > files if the session is already open, which seems to imply that it's ok to > not update the config. > In most cases, the session is opened only once or reopened without intending > to change the config (e.g. if it times out). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186152#comment-16186152 ] Siddharth Seth commented on HIVE-16927: --- +1. Looks good to me. > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16927 > URL: https://issues.apache.org/jira/browse/HIVE-16927 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16927.1.patch > > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17431) change configuration handling in TezSessionState
[ https://issues.apache.org/jira/browse/HIVE-17431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156046#comment-16156046 ] Siddharth Seth commented on HIVE-17431: --- {code} refreshLocalResourcesFromConf(conf); {code} in openInternal seems to be a potential problem area. Either it is missing LRs for the new session, or this code should not exist anymore. For the most part, I suspect some of the other parameters in this class can be made final as well. Unrelated to the patch: - There's places where the queue apparently gets changed from TezSessionPool. Didn't know a single SessionState could be moved across queues. Seems unnecessary. - replaceSession - maybe simpler to move the implementation into TezSessionState itself. e.g. additionLocalResourcesNotFromConf is fetched and then passed back in to the open method... > change configuration handling in TezSessionState > > > Key: HIVE-17431 > URL: https://issues.apache.org/jira/browse/HIVE-17431 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17431.patch > > > The configuration is only set when opening the session; that seems > unnecessary - it could be set in the ctor and made final. E.g. when updating > the session and localizing new resources we may theoretically open the > session with a new config, but we don't update the config and only update the > files if the session is already open, which seems to imply that it's ok to > not update the config. > In most cases, the session is opened only once or reopened without intending > to change the config (e.g. if it times out). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17380) refactor LlapProtocolClientProxy to be usable with other protocols
[ https://issues.apache.org/jira/browse/HIVE-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141017#comment-16141017 ] Siddharth Seth commented on HIVE-17380: --- This mainly makes the one per node a little more generic? Is there another protocol this needs to be used with? I was, at some point, planning to delink from protocols etc. Essentially 1 thread per any single entity. +1 for the patch. > refactor LlapProtocolClientProxy to be usable with other protocols > -- > > Key: HIVE-17380 > URL: https://issues.apache.org/jira/browse/HIVE-17380 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17380.patch, HIVE-17380.patch > > > This basically moves a bunch of code into a generic async PB RPC proxy, in > llap-common for now. Moving to common would require one to move LlapNodeId, > that can be done later. > The only logic change is that concurrent hash map, that never expires, is > replaced by Guava cache. A path to shut down a proxy is added, but does > nothing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17360) Tez session reopen appears to use a wrong conf object
[ https://issues.apache.org/jira/browse/HIVE-17360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136116#comment-16136116 ] Siddharth Seth commented on HIVE-17360: --- +1 > Tez session reopen appears to use a wrong conf object > - > > Key: HIVE-17360 > URL: https://issues.apache.org/jira/browse/HIVE-17360 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17360.01.patch, HIVE-17360.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17256) add a notion of a guaranteed task to LLAP
[ https://issues.apache.org/jira/browse/HIVE-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128100#comment-16128100 ] Siddharth Seth commented on HIVE-17256: --- I did actually mean TaskExecutorService tests, but you say that is already covered. +1. (A short writeup on the overall plan would be useful for reference) > add a notion of a guaranteed task to LLAP > - > > Key: HIVE-17256 > URL: https://issues.apache.org/jira/browse/HIVE-17256 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17256.01.patch, HIVE-17256.patch > > > Tasks are basically on two levels, guaranteed and speculative, with > speculative being the default. As long as noone uses the new flag, the tasks > behave the same. > All the tasks that do have the flag also behave the same with regard to each > other. > The difference is that a guaranteed task is always higher priority, and > preempts, a speculative task. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17294) LLAP: switch task heartbeats to protobuf
[ https://issues.apache.org/jira/browse/HIVE-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123787#comment-16123787 ] Siddharth Seth edited comment on HIVE-17294 at 8/11/17 6:21 PM: The endpoint is in the LLAP AM plugin. However, that extends an upstream tez plugin - so some translation will likely be required. That's an unnecessary step. Other than time to work on this, another thing to watch out for is the cost of serializing protobuf, and memory overhead of copying buffers with protobuf rpc engine. At the moment, some parts of the system use the ProtobufRpcEngine from Hadoop, other parts use the WritableRpcEngine (specifically the task to AM communication). When I say part of he work is done - it's the representation of various pieces of information in protobuf. was (Author: sseth): The endpoint is in the LLAP AM plugin. However, that extends an upstream tez plugin - so some translation will likely be required. That's an unnecessary step. Other than time to work on this, another thing to watch out for is the cost of serializing protobuf, and memory overhead of copying buffers with protobuf rpc engine. At the moment, some parts of the system use the ProtobufRpcEngine from Hadoop, other parts use the WritableRpcEngine (specifically the task to AM communication). > LLAP: switch task heartbeats to protobuf > > > Key: HIVE-17294 > URL: https://issues.apache.org/jira/browse/HIVE-17294 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17294) LLAP: switch task heartbeats to protobuf
[ https://issues.apache.org/jira/browse/HIVE-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123787#comment-16123787 ] Siddharth Seth commented on HIVE-17294: --- The endpoint is in the LLAP AM plugin. However, that extends an upstream tez plugin - so some translation will likely be required. That's an unnecessary step. Other than time to work on this, another thing to watch out for is the cost of serializing protobuf, and memory overhead of copying buffers with protobuf rpc engine. At the moment, some parts of the system use the ProtobufRpcEngine from Hadoop, other parts use the WritableRpcEngine (specifically the task to AM communication). > LLAP: switch task heartbeats to protobuf > > > Key: HIVE-17294 > URL: https://issues.apache.org/jira/browse/HIVE-17294 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17294) LLAP: switch task heartbeats to protobuf
[ https://issues.apache.org/jira/browse/HIVE-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122524#comment-16122524 ] Siddharth Seth commented on HIVE-17294: --- Nope. Half the work is already done in terms of defining the protobuf structures. See https://issues.apache.org/jira/browse/TEZ-305? Also the protobuf rpc engine? > LLAP: switch task heartbeats to protobuf > > > Key: HIVE-17294 > URL: https://issues.apache.org/jira/browse/HIVE-17294 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17256) add a notion of a guaranteed task to LLAP
[ https://issues.apache.org/jira/browse/HIVE-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122512#comment-16122512 ] Siddharth Seth commented on HIVE-17256: --- Mostly looks good. Would be nice to have a few more tests on the ordering of the various queues / or even better the scheduler making correct decisions. A guaranteed task will also replace a non-guaranteed task, irrespective of finishable state? Wasn't there some potential for deadlocks with this? In terms of the todo wtf - iirc fixing that requires making some biggish changes in tez internals to prevent the same finish being registered multiple times over. Would be good to leave that as a comment if it does not exist, instead of the wtf. The original jira where that was added should have more context. > add a notion of a guaranteed task to LLAP > - > > Key: HIVE-17256 > URL: https://issues.apache.org/jira/browse/HIVE-17256 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17256.patch > > > Tasks are basically on two levels, guaranteed and speculative, with > speculative being the default. As long as noone uses the new flag, the tasks > behave the same. > All the tasks that do have the flag also behave the same with regard to each > other. > The difference is that a guaranteed task is always higher priority, and > preempts, a speculative task. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102291#comment-16102291 ] Siddharth Seth commented on HIVE-17160: --- The DagUtils part of the changes look good to me. I believe the existing code for non-cluster split generation was already non-functional from the testing you had done? If that's the case, there's no reason to add another option to say GenerateSplitsOnClientMethod1 vs GenerateSplitsOnClientMethod2. > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.
[ https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099157#comment-16099157 ] Siddharth Seth commented on HIVE-17019: --- Re-looked at the patch. Mostly looks good. Some comments and questions. - How is the context set up for LogDownloadServlet. e.g.CONF_LOG_DOWNLODER_NUM_EXECUTORS. The config should likely be set up in HiveConf in some way. - init for the servlet will happen once at startup? So if there's multiple requests to download, and the limit is hit, all webserver threads will block? Should we just return an error if there's too many parallel downloads, so that other parts of the UI continue to be functional. - In terms of the security - this becomes interesting. Essentially says that the feature will only work if authentication is enabled on secure clusters. - Timeout for the downloads as a separate jira? - Are any credentials required on the HttpClient created to download artifacts from various end points? - For Constants like TIMELINE_PATH_PREFIX - any chance YARN has a helper method? Otherwise we should file a jira to ask yarn to expose such utilities. - Both dagId and queryId cannot be specified at the same time? > Add support to download debugging information as an archive. > > > Key: HIVE-17019 > URL: https://issues.apache.org/jira/browse/HIVE-17019 > Project: Hive > Issue Type: Bug >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash > Attachments: HIVE-17019.01.patch, HIVE-17019.02.patch, > HIVE-17019.03.patch > > > Given a queryId or dagId, get all information related to it: like, tez am, > task logs, hive ats data, tez ats data, slider am status, etc. Package it > into and archive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17091) "Timed out getting readerEvents" error from external LLAP client
[ https://issues.apache.org/jira/browse/HIVE-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090889#comment-16090889 ] Siddharth Seth commented on HIVE-17091: --- [~jdere] - after the change, the read itself removes any timeout handling. Instead, if task heartbeats are not received within a certain amount of time - that mechanism will cause the task to timeout? +1 for the patch, if my understanding is correct. > "Timed out getting readerEvents" error from external LLAP client > > > Key: HIVE-17091 > URL: https://issues.apache.org/jira/browse/HIVE-17091 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17091.1.patch > > > {noformat} > Caused by: java.io.IOException: Timed out getting readerEvents > at > org.apache.hadoop.hive.llap.LlapBaseRecordReader.getReaderEvent(LlapBaseRecordReader.java:261) > at > org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:148) > at > org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:48) > at > org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:121) > at > org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:68) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16690) Configure Tez cartesian product edge based on LLAP cluster size
[ https://issues.apache.org/jira/browse/HIVE-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090585#comment-16090585 ] Siddharth Seth commented on HIVE-16690: --- +1 for the addendum. If cross product does end up getting invoked, is it possible to log the configured value for parallelism? This can be in a follow up jira. Also, I think this clusterstate lookup is very avoidable if cross product is not invoked. Don't need to invoke the ZK registry for each query (even though it does cache records for 2 minutes) > Configure Tez cartesian product edge based on LLAP cluster size > --- > > Key: HIVE-16690 > URL: https://issues.apache.org/jira/browse/HIVE-16690 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-16690.1.patch, HIVE-16690.addendum.patch > > > In HIVE-14731 we are using default value for target parallelism of fair > cartesian product edge. Ideally this should be set according to cluster size. > In case of LLAP it's pretty easy to get cluster size, i.e., number of > executors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17093) LLAP ssl configs need to be localized to talk to a wire encrypted hdfs
[ https://issues.apache.org/jira/browse/HIVE-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-17093: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > LLAP ssl configs need to be localized to talk to a wire encrypted hdfs > -- > > Key: HIVE-17093 > URL: https://issues.apache.org/jira/browse/HIVE-17093 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 3.0.0 > > Attachments: HIVE-17093.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17093) LLAP ssl configs need to be localized to talk to a wire encrypted hdfs
[ https://issues.apache.org/jira/browse/HIVE-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088262#comment-16088262 ] Siddharth Seth commented on HIVE-17093: --- Thanks. Committing. > LLAP ssl configs need to be localized to talk to a wire encrypted hdfs > -- > > Key: HIVE-17093 > URL: https://issues.apache.org/jira/browse/HIVE-17093 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-17093.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.
[ https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088180#comment-16088180 ] Siddharth Seth commented on HIVE-17019: --- bq. The llap status is fetched using LlapStatusServiceDriver which is part of hive-llap-server. OK. llap-status should really be under it's own module. Anyway, that can be changed later. Another alternate is to have llap-status hosted as a webservice. That can happen in a follow up. Would be really good to skip the llap-server dependency, which in turn pulls in a lot of others. bq. Creating a shared executor, does it make sense to use Guava's direct executor, which will schedule task in current thread. Sure. It could just be done inline otherwise? The main question is how the total number of downloads are restricted. Restricting the number of handlers on the web interface does not help, since that serves out a lot more than download debug artifacts. bq. For streaming directly, it would not be possible because of multithreading. If its single threaded then I can use a ZipOutputStream and add entry one at a time. Think this is ok as long as files are cleaned up. May want to cap file sizes as well, since logs can get really large. bq. I was planning to remove this and integrate with hive cli, --service . This does not work without lot of classpath fixes, or I'll have to create a script to add hive jars. Sounds good. bq. Will check a few libs, apache commons OptionBuilder uses a static instance in its builder. Should be ok, for a cli based invoke once app, but will look at something better on lines of python argparse. Sounds good. Maybe in a follow up jira to get this in faster? bq. Sure, will do. Global or per download or both? Global, defined by the server. bq. Stop if no new sources could download or all sources are exhausted. Sounds good. I misread the code, sorry. bq. Jetty will handle the exception, returning 500 to the user. Not sure if exception trace is part of it. Will try and see. Think this is ok as long as the error has enough information to let the user know what happened. > Add support to download debugging information as an archive. > > > Key: HIVE-17019 > URL: https://issues.apache.org/jira/browse/HIVE-17019 > Project: Hive > Issue Type: Bug >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash > Attachments: HIVE-17019.01.patch > > > Given a queryId or dagId, get all information related to it: like, tez am, > task logs, hive ats data, tez ats data, slider am status, etc. Package it > into and archive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17093) LLAP ssl configs need to be localized to talk to a wire encrypted hdfs
[ https://issues.apache.org/jira/browse/HIVE-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088096#comment-16088096 ] Siddharth Seth commented on HIVE-17093: --- [~gopalv] - don't think ssl shuffle works at the moment with llap. I can add the ssl-server changes back if you think that is required. > LLAP ssl configs need to be localized to talk to a wire encrypted hdfs > -- > > Key: HIVE-17093 > URL: https://issues.apache.org/jira/browse/HIVE-17093 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-17093.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17093) LLAP ssl configs need to be localized to talk to a wire encrypted hdfs
[ https://issues.apache.org/jira/browse/HIVE-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-17093: -- Status: Patch Available (was: Open) > LLAP ssl configs need to be localized to talk to a wire encrypted hdfs > -- > > Key: HIVE-17093 > URL: https://issues.apache.org/jira/browse/HIVE-17093 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-17093.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17093) LLAP ssl configs need to be localized to talk to a wire encrypted hdfs
[ https://issues.apache.org/jira/browse/HIVE-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-17093: -- Attachment: HIVE-17093.01.patch The patch localizes ssl-client.xml instead of ssl-server.xml which is used by NNs/DNs etc. Also, it stops loading the configs, since they are loaded by relevant sections of DFSClient when required. [~gopalv] - can you please take a look. I don't think this breaks anything for the LLAP UI wire encryption case (already broken?) > LLAP ssl configs need to be localized to talk to a wire encrypted hdfs > -- > > Key: HIVE-17093 > URL: https://issues.apache.org/jira/browse/HIVE-17093 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-17093.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17093) LLAP ssl configs need to be localized to talk to a wire encrypted hdfs
[ https://issues.apache.org/jira/browse/HIVE-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-17093: - > LLAP ssl configs need to be localized to talk to a wire encrypted hdfs > -- > > Key: HIVE-17093 > URL: https://issues.apache.org/jira/browse/HIVE-17093 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086607#comment-16086607 ] Siddharth Seth commented on HIVE-16926: --- +1. Looks good. There's a bunch of unused imports which I forgot to mention. Would be nice to remove those before commit. > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, > HIVE-16926.3.patch, HIVE-16926.4.patch, HIVE-16926.5.patch > > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084817#comment-16084817 ] Siddharth Seth commented on HIVE-16926: --- bq. Is there any action needed on this part? I don't thing there is, unless you see this as a problem for the running spark task. The number of threads created etc is quite small afaik. bq. Maybe I can just replace pendingClients/registeredClients with a single list and the RequestInfo can keep a state to show if the request is pending/running/etc. That'll work as well. Think there's still 2 places which have similar code related to heartbeats - heartbeat / nodePinged. > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, > HIVE-16926.3.patch, HIVE-16926.4.patch > > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.
[ https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083111#comment-16083111 ] Siddharth Seth commented on HIVE-17019: --- Thanks for posting the patch. Will be useful to get relevant data for a query. - Change the top level package from llap-debug to tez-debug? (Works with both I believe) [~ashutoshc], [~thejas] - any recommendations on whether the code gets a top level module, or goes under an existing module. This allows downloading of various debug artifacts for a tez job - logs, metrics for llap, hiveserver2 logs (soon), tez am logs, ATS data for the query (hive and tez). - In the new pom.xml, dependency on hive-llap-server. 1) Is it required?, 2) Will need to exclude some dependent artifacts. See service/pom.xml llap-server dependency handling - LogDownloadServlet - Should this throw an error as soon as the filename pattern validation fails? - LogDownloadServlet - change to dagId/queryId validation instead - LogDownloadServlet - thread being created inside of the request handler? This should be limited outside of the request? so that only a controlled number of parallel artifact downloads can run. - LogDownloadServlet - what happens in case of aggregator failure? Exception back to the user? - LogDownloadServlet - seems to be generating the file to disk and then streaming it over. Can this be streamed over directly instead. Otherwise there's the possibility of leaking files. (Artifact.downloadIntoStream or some such?) Guessing this is complicated further by the multi-threaded artifact downloader. Alternately need to have a cleanup mechanism. - Timeout on the tests - Apache header needs to be added to files where it is missing. - Main - Please rename to something more indicative of what the tool does. - Main - Likely a follow up jira - parse using a standard library, instead of trying to parse the arguments to main directly. - Server - Enabling the artifact should be controlled via a config. Does not always need to be hosted in HS2 (Default disabled, at least till security can be sorted out) - Is it possible to support a timeout on the downloads? (Can be a follow up jira) - ArtifactAggregator - I believe this does 2 stages of dependent artifacts / downloads? Stage1 - download whatever it can. Information from this should should be adequate for stage2 downloads ? - For the ones not implemented yet (DummyArtifact) - think it's better to just comment out the code, instead of invoking the DummyArtifacts downloader - Security - ACL enforcement required on secure clusters to make sure users can only download what they have access to. This is a must fix before this can be enabled by default. - Security - this can work around yarn restrictions on log downloads, since the files are being accessed by the hive user. Could you please add some details on cluster testing. > Add support to download debugging information as an archive. > > > Key: HIVE-17019 > URL: https://issues.apache.org/jira/browse/HIVE-17019 > Project: Hive > Issue Type: Bug >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash > Attachments: HIVE-17019.01.patch > > > Given a queryId or dagId, get all information related to it: like, tez am, > task logs, hive ats data, tez ats data, slider am status, etc. Package it > into and archive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082995#comment-16082995 ] Siddharth Seth commented on HIVE-16926: --- Functionally, looks good to me. Minor comments. - umbilicalServer.umbilicalProtocol.pendingClients.putIfAbsent -> Would be a little cleaner to add a method for this, similar to unregisterClient. - {code} + for (String key : umbilicalImpl.pendingClients.keySet()) { +LlapTaskUmbilicalExternalClient client = umbilicalImpl.pendingClients.get(key);{code} Replace with an iterator over the entrySet to avoid the get() ? Also, this pattern is repeated in hearbeat and nodeHeartbeat - could likely be a method. If I'm not mistaken, the shared umbilical server will not be shut down ever? Maybe in a follow up - some of the static classes could be split out. > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, > HIVE-16926.3.patch, HIVE-16926.4.patch > > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations
[ https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081123#comment-16081123 ] Siddharth Seth commented on HIVE-17067: --- +1. Looks good. Does anything specific need to be looked at in terms of security? > LLAP: Add http endpoint to provide system level configurations > -- > > Key: HIVE-17067 > URL: https://issues.apache.org/jira/browse/HIVE-17067 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0 > > Attachments: HIVE-17067.1.patch > > > Add an endpoint to get kernel and network configs via sysctl. Also memory > related configs like transparent huge pages config can be added. "ulimit -a" > can be added to llap startup script as it needs a shell. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056802#comment-16056802 ] Siddharth Seth commented on HIVE-16927: --- [~prasanth_j] - I don't think we make a permanent change of this being set to 0. A bad instance will never stop on it's own, and will keep trying to launch new containers. A better default would likely be numInstances, while making sure it is not too low (6 is the default for example), and the value is high enough to allow a node to be blacklisted. Option1: numInstances * threshold to mark a node as disabled. Option2: max(6, max(numInstances, threshold to mark a node as disabled)) Option3: ? An enhancement request to Slider to get better control over this > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16927 > URL: https://issues.apache.org/jira/browse/HIVE-16927 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16927.1.patch > > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16847) LLAP queue order issue
[ https://issues.apache.org/jira/browse/HIVE-16847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054645#comment-16054645 ] Siddharth Seth commented on HIVE-16847: --- +1 > LLAP queue order issue > -- > > Key: HIVE-16847 > URL: https://issues.apache.org/jira/browse/HIVE-16847 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-16847.01.patch, HIVE-16847.01.patch, > HIVE-16847.02.patch, HIVE-16847.03.patch, HIVE-16847.04.patch, > HIVE-16847.05.patch, HIVE-16847.06.patch, HIVE-16847.patch > > > There's an ordering issue with the LLAP queue that we've seen on some run. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16847) LLAP queue order issue
[ https://issues.apache.org/jira/browse/HIVE-16847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054562#comment-16054562 ] Siddharth Seth commented on HIVE-16847: --- Mostly looks good. Couple of small changes before committing. - Would be good to completely remove EvictingPriotiyBlockingQueue.reinsertIfExistsUnsafe - is no longer user except in tests. - The synchronized block in getExecutorsStatus. What happens if this is removed? (The UI being able to affect scheduling is not great) > LLAP queue order issue > -- > > Key: HIVE-16847 > URL: https://issues.apache.org/jira/browse/HIVE-16847 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-16847.01.patch, HIVE-16847.01.patch, > HIVE-16847.02.patch, HIVE-16847.03.patch, HIVE-16847.04.patch, > HIVE-16847.05.patch, HIVE-16847.patch > > > There's an ordering issue with the LLAP queue that we've seen on some run. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit
[ https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047007#comment-16047007 ] Siddharth Seth commented on HIVE-16820: --- +1. I'd still change the isShutdown to a volatile, before commit, for the one opportunistic check to be more consistent. > TezTask may not shut down correctly before submit > - > > Key: HIVE-16820 > URL: https://issues.apache.org/jira/browse/HIVE-16820 > Project: Hive > Issue Type: Bug >Reporter: Visakh Nair >Assignee: Sergey Shelukhin > Attachments: HIVE-16820.01.patch, HIVE-16820.patch > > > The query will run and only fail at the very end when the driver checks its > own shutdown flag. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit
[ https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044822#comment-16044822 ] Siddharth Seth commented on HIVE-16820: --- Sorry for the delay. - Should isShutdown be volatile for visibility outside of the dagLock synchronized block? - Think it is possible for two invocations of tryKillDag. submitDag sets this.dagClient. shutdown sees this, and invokes tryKillDag. the submit thread starts up again and sees isShutdown=true. Otherwise, looks good. > TezTask may not shut down correctly before submit > - > > Key: HIVE-16820 > URL: https://issues.apache.org/jira/browse/HIVE-16820 > Project: Hive > Issue Type: Bug >Reporter: Visakh Nair >Assignee: Sergey Shelukhin > Attachments: HIVE-16820.patch > > > The query will run and only fail at the very end when the driver checks its > own shutdown flag. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16460) In the console output, show vertex list in topological order instead of an alphabetical sort
[ https://issues.apache.org/jira/browse/HIVE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035451#comment-16035451 ] Siddharth Seth commented on HIVE-16460: --- +1 > In the console output, show vertex list in topological order instead of an > alphabetical sort > > > Key: HIVE-16460 > URL: https://issues.apache.org/jira/browse/HIVE-16460 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Prasanth Jayachandran > Attachments: HIVE-16460.1.patch, HIVE-16460.2.patch > > > cc [~prasanth_j] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16460) In the console output, show vertex list in topological order instead of an alphabetical sort
[ https://issues.apache.org/jira/browse/HIVE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035224#comment-16035224 ] Siddharth Seth commented on HIVE-16460: --- bq. So I would assume so Missed the javadoc comment. Hope it's valid :) Think it's best to add the check and commit this. > In the console output, show vertex list in topological order instead of an > alphabetical sort > > > Key: HIVE-16460 > URL: https://issues.apache.org/jira/browse/HIVE-16460 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Prasanth Jayachandran > Attachments: HIVE-16460.1.patch > > > cc [~prasanth_j] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16460) In the console output, show vertex list in topological order instead of an alphabetical sort
[ https://issues.apache.org/jira/browse/HIVE-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033732#comment-16033732 ] Siddharth Seth commented on HIVE-16460: --- Is baseWork topo sorted? Does each BaseWork instance always correspond to a Vertex in Tez (I'm not sure this is the case. e.g. Unions). Adding a check to ensure that the BaseWork exists in the Vertex information returned by Tez will be useful. If I'm not mistaken, removing the sort in the existing code would be sufficient, and Tez will return the vertices in a topological sorted order. Can't really rely on this though, since it's not part of the API spec. > In the console output, show vertex list in topological order instead of an > alphabetical sort > > > Key: HIVE-16460 > URL: https://issues.apache.org/jira/browse/HIVE-16460 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Prasanth Jayachandran > Attachments: HIVE-16460.1.patch > > > cc [~prasanth_j] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-10848) LLAP: Better handling of hostnames when sending heartbeats to the AM
[ https://issues.apache.org/jira/browse/HIVE-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033375#comment-16033375 ] Siddharth Seth commented on HIVE-10848: --- [~sershe] - you'll need to translate that. > LLAP: Better handling of hostnames when sending heartbeats to the AM > > > Key: HIVE-10848 > URL: https://issues.apache.org/jira/browse/HIVE-10848 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Siddharth Seth > Fix For: llap > > > Daemons send an alive message to the listening co-ordinator - along with the > daemon's hostname, which is used to keep tasks alive. > This can be problematic with hostname resolution if the AM and dameons end up > using different hostnames. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16777) LLAP: Use separate tokens and UGI instances when an external client is used
[ https://issues.apache.org/jira/browse/HIVE-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16777: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > LLAP: Use separate tokens and UGI instances when an external client is used > --- > > Key: HIVE-16777 > URL: https://issues.apache.org/jira/browse/HIVE-16777 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 3.0.0 > > Attachments: HIVE-16777.01.patch > > > Otherwise leads to errors since the token is shared, and there's different > nodes running Umbilical. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16777) LLAP: Use separate tokens and UGI instances when an external client is used
[ https://issues.apache.org/jira/browse/HIVE-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027510#comment-16027510 ] Siddharth Seth commented on HIVE-16777: --- Test failures are not related. Committing. Created HIVE-16781 to track improvements. > LLAP: Use separate tokens and UGI instances when an external client is used > --- > > Key: HIVE-16777 > URL: https://issues.apache.org/jira/browse/HIVE-16777 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-16777.01.patch > > > Otherwise leads to errors since the token is shared, and there's different > nodes running Umbilical. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16777) LLAP: Use separate tokens and UGI instances when an external client is used
[ https://issues.apache.org/jira/browse/HIVE-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027062#comment-16027062 ] Siddharth Seth commented on HIVE-16777: --- True. Multiple across hosts would need to be handled. The client needs to stop creating an umbilical per fragment. > LLAP: Use separate tokens and UGI instances when an external client is used > --- > > Key: HIVE-16777 > URL: https://issues.apache.org/jira/browse/HIVE-16777 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-16777.01.patch > > > Otherwise leads to errors since the token is shared, and there's different > nodes running Umbilical. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16777) LLAP: Use separate tokens and UGI instances when an external client is used
[ https://issues.apache.org/jira/browse/HIVE-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16777: -- Target Version/s: 3.0.0 Status: Patch Available (was: Open) > LLAP: Use separate tokens and UGI instances when an external client is used > --- > > Key: HIVE-16777 > URL: https://issues.apache.org/jira/browse/HIVE-16777 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-16777.01.patch > > > Otherwise leads to errors since the token is shared, and there's different > nodes running Umbilical. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16777) LLAP: Use separate tokens and UGI instances when an external client is used
[ https://issues.apache.org/jira/browse/HIVE-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16777: -- Attachment: HIVE-16777.01.patch cc [~sershe] for review. > LLAP: Use separate tokens and UGI instances when an external client is used > --- > > Key: HIVE-16777 > URL: https://issues.apache.org/jira/browse/HIVE-16777 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-16777.01.patch > > > Otherwise leads to errors since the token is shared, and there's different > nodes running Umbilical. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16777) LLAP: Use separate tokens and UGI instances when an external client is used
[ https://issues.apache.org/jira/browse/HIVE-16777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-16777: - > LLAP: Use separate tokens and UGI instances when an external client is used > --- > > Key: HIVE-16777 > URL: https://issues.apache.org/jira/browse/HIVE-16777 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > > Otherwise leads to errors since the token is shared, and there's different > nodes running Umbilical. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16754) LLAP: Print hive version info on llap daemon startup
[ https://issues.apache.org/jira/browse/HIVE-16754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023708#comment-16023708 ] Siddharth Seth commented on HIVE-16754: --- +1 > LLAP: Print hive version info on llap daemon startup > > > Key: HIVE-16754 > URL: https://issues.apache.org/jira/browse/HIVE-16754 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Attachments: HIVE-16754.1.patch > > > For debugging purpose, print out hive version info on llap daemon startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-14052) Cleanup structures when external clients use LLAP
[ https://issues.apache.org/jira/browse/HIVE-14052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14052: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > Cleanup structures when external clients use LLAP > - > > Key: HIVE-14052 > URL: https://issues.apache.org/jira/browse/HIVE-14052 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Siddharth Seth > Fix For: 3.0.0 > > Attachments: HIVE-14052.02.patch, HIVE-14052.04.patch, > HIVE-14052.1.patch > > > Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP > to track a query will keep building up slowly over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-14052) Cleanup structures when external clients use LLAP
[ https://issues.apache.org/jira/browse/HIVE-14052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14052: -- Summary: Cleanup structures when external clients use LLAP (was: Cleanup of structures required when LLAP access from external clients completes) > Cleanup structures when external clients use LLAP > - > > Key: HIVE-14052 > URL: https://issues.apache.org/jira/browse/HIVE-14052 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Siddharth Seth > Attachments: HIVE-14052.02.patch, HIVE-14052.04.patch, > HIVE-14052.1.patch > > > Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP > to track a query will keep building up slowly over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-14052) Cleanup of structures required when LLAP access from external clients completes
[ https://issues.apache.org/jira/browse/HIVE-14052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016319#comment-16016319 ] Siddharth Seth commented on HIVE-14052: --- Thanks. Committing. > Cleanup of structures required when LLAP access from external clients > completes > --- > > Key: HIVE-14052 > URL: https://issues.apache.org/jira/browse/HIVE-14052 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Siddharth Seth > Attachments: HIVE-14052.02.patch, HIVE-14052.04.patch, > HIVE-14052.1.patch > > > Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP > to track a query will keep building up slowly over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16703) Hive may add the same file to the session and vertex in Tez
[ https://issues.apache.org/jira/browse/HIVE-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016281#comment-16016281 ] Siddharth Seth commented on HIVE-16703: --- +1 > Hive may add the same file to the session and vertex in Tez > --- > > Key: HIVE-16703 > URL: https://issues.apache.org/jira/browse/HIVE-16703 > Project: Hive > Issue Type: Bug >Reporter: Mahesh Balakrishnan >Assignee: Sergey Shelukhin > Attachments: HIVE-16703.patch > > > In newer versions of Tez, the check was added that treats this as an error. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16692) LLAP: Keep alive connection in shuffle handler should not be closed until entire data is flushed out
[ https://issues.apache.org/jira/browse/HIVE-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16692: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed the addendum patch. > LLAP: Keep alive connection in shuffle handler should not be closed until > entire data is flushed out > > > Key: HIVE-16692 > URL: https://issues.apache.org/jira/browse/HIVE-16692 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-16692.02.patch, HIVE-16692.1.patch, > HIVE-16692.addendum.patch > > > In corner cases with keep-alive enabled, it is possible that the headers are > written out in the response and downstream was able to read the headers. > But possible that the mapOutput construction took a lot longer time (due to > disk or any other issue) in server side. In the mean time, keep alive timeout > can kick in and close the connection from server side. In such cases, there > is a possibility that downstream can get "connection reset". Ideally keep > alive should kick in only after flushing entire response downstream. > e.g error msg in client side > {noformat} > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:209) > ~[?:1.8.0_112] > at java.net.SocketInputStream.read(SocketInputStream.java:141) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:675) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) > ~[?:1.8.0_112] > at > org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:460) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:492) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:417) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:215) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:73) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > ~[tez-common-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_112] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] > {noformat} > This corner case handling was not pulled in earlier from MR handler fixes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-14052) Cleanup of structures required when LLAP access from external clients completes
[ https://issues.apache.org/jira/browse/HIVE-14052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016270#comment-16016270 ] Siddharth Seth commented on HIVE-14052: --- An AppId needs to be created (required for a bunch of processing). We also need a hive queryId, which is a string. The appId was being used as this unique identifier. However, it wasn't being used everywhere, which resulted in multiple log files. The patch moves the AppId creation further up, and uses it in the config so that a single log file is created. Same as a regular query - 2 appIds. One for the query, one for LLAP itself. > Cleanup of structures required when LLAP access from external clients > completes > --- > > Key: HIVE-14052 > URL: https://issues.apache.org/jira/browse/HIVE-14052 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Siddharth Seth > Attachments: HIVE-14052.02.patch, HIVE-14052.04.patch, > HIVE-14052.1.patch > > > Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP > to track a query will keep building up slowly over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-13673) LLAP: handle case where no service instance is found on the host specified in the input split
[ https://issues.apache.org/jira/browse/HIVE-13673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016175#comment-16016175 ] Siddharth Seth commented on HIVE-13673: --- +1 Looks good. Minor: Probably better to create the random instance up front, if BaseInputFormat is likely to be used multiple times, and especially if it's from different threads. The way this is handled for Hive queries, the next node isn't exactly random. There's sequencing involved to increases the chances of a cache hit. https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java#L831 I think it's worth creating a follow up jira to use this approach. > LLAP: handle case where no service instance is found on the host specified in > the input split > - > > Key: HIVE-13673 > URL: https://issues.apache.org/jira/browse/HIVE-13673 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-13673.1.patch > > > From [~sseth] on review of HIVE-13620, in regards to > LlapBaseInputFormat.getServiceInstance() and how to handle the case of no > LLAP service instance for the host specified in the LLAP input split: > {quote} > This should really be a jira and TODO (post merge to master) - to either 1) > go to an alternate random address from the available llap instances, or 2) > have additional locations provided by HS2. > I'd lean towards 1. It's absolutely possible for an llap instance to go down, > or the node to go down, which would end up causing failures. > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16692) LLAP: Keep alive connection in shuffle handler should not be closed until entire data is flushed out
[ https://issues.apache.org/jira/browse/HIVE-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16692: -- Status: Patch Available (was: Reopened) > LLAP: Keep alive connection in shuffle handler should not be closed until > entire data is flushed out > > > Key: HIVE-16692 > URL: https://issues.apache.org/jira/browse/HIVE-16692 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-16692.02.patch, HIVE-16692.1.patch, > HIVE-16692.addendum.patch > > > In corner cases with keep-alive enabled, it is possible that the headers are > written out in the response and downstream was able to read the headers. > But possible that the mapOutput construction took a lot longer time (due to > disk or any other issue) in server side. In the mean time, keep alive timeout > can kick in and close the connection from server side. In such cases, there > is a possibility that downstream can get "connection reset". Ideally keep > alive should kick in only after flushing entire response downstream. > e.g error msg in client side > {noformat} > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:209) > ~[?:1.8.0_112] > at java.net.SocketInputStream.read(SocketInputStream.java:141) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:675) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) > ~[?:1.8.0_112] > at > org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:460) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:492) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:417) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:215) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:73) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > ~[tez-common-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_112] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] > {noformat} > This corner case handling was not pulled in earlier from MR handler fixes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16692) LLAP: Keep alive connection in shuffle handler should not be closed until entire data is flushed out
[ https://issues.apache.org/jira/browse/HIVE-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16692: -- Attachment: HIVE-16692.02.patch > LLAP: Keep alive connection in shuffle handler should not be closed until > entire data is flushed out > > > Key: HIVE-16692 > URL: https://issues.apache.org/jira/browse/HIVE-16692 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-16692.02.patch, HIVE-16692.1.patch, > HIVE-16692.addendum.patch > > > In corner cases with keep-alive enabled, it is possible that the headers are > written out in the response and downstream was able to read the headers. > But possible that the mapOutput construction took a lot longer time (due to > disk or any other issue) in server side. In the mean time, keep alive timeout > can kick in and close the connection from server side. In such cases, there > is a possibility that downstream can get "connection reset". Ideally keep > alive should kick in only after flushing entire response downstream. > e.g error msg in client side > {noformat} > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:209) > ~[?:1.8.0_112] > at java.net.SocketInputStream.read(SocketInputStream.java:141) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:675) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) > ~[?:1.8.0_112] > at > org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:460) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:492) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:417) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:215) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:73) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > ~[tez-common-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_112] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] > {noformat} > This corner case handling was not pulled in earlier from MR handler fixes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (HIVE-16692) LLAP: Keep alive connection in shuffle handler should not be closed until entire data is flushed out
[ https://issues.apache.org/jira/browse/HIVE-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reopened HIVE-16692: --- Re-opening, and re-uploading patch with a different name for jenkins. > LLAP: Keep alive connection in shuffle handler should not be closed until > entire data is flushed out > > > Key: HIVE-16692 > URL: https://issues.apache.org/jira/browse/HIVE-16692 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-16692.1.patch, HIVE-16692.addendum.patch > > > In corner cases with keep-alive enabled, it is possible that the headers are > written out in the response and downstream was able to read the headers. > But possible that the mapOutput construction took a lot longer time (due to > disk or any other issue) in server side. In the mean time, keep alive timeout > can kick in and close the connection from server side. In such cases, there > is a possibility that downstream can get "connection reset". Ideally keep > alive should kick in only after flushing entire response downstream. > e.g error msg in client side > {noformat} > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:209) > ~[?:1.8.0_112] > at java.net.SocketInputStream.read(SocketInputStream.java:141) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:675) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) > ~[?:1.8.0_112] > at > org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:460) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:492) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:417) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:215) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:73) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > ~[tez-common-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_112] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] > {noformat} > This corner case handling was not pulled in earlier from MR handler fixes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-14052) Cleanup of structures required when LLAP access from external clients completes
[ https://issues.apache.org/jira/browse/HIVE-14052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14052: -- Attachment: HIVE-14052.04.patch Updated patch. > Cleanup of structures required when LLAP access from external clients > completes > --- > > Key: HIVE-14052 > URL: https://issues.apache.org/jira/browse/HIVE-14052 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Siddharth Seth > Attachments: HIVE-14052.02.patch, HIVE-14052.04.patch, > HIVE-14052.1.patch > > > Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP > to track a query will keep building up slowly over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16692) LLAP: Keep alive connection in shuffle handler should not be closed until entire data is flushed out
[ https://issues.apache.org/jira/browse/HIVE-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16692: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > LLAP: Keep alive connection in shuffle handler should not be closed until > entire data is flushed out > > > Key: HIVE-16692 > URL: https://issues.apache.org/jira/browse/HIVE-16692 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-16692.1.patch > > > In corner cases with keep-alive enabled, it is possible that the headers are > written out in the response and downstream was able to read the headers. > But possible that the mapOutput construction took a lot longer time (due to > disk or any other issue) in server side. In the mean time, keep alive timeout > can kick in and close the connection from server side. In such cases, there > is a possibility that downstream can get "connection reset". Ideally keep > alive should kick in only after flushing entire response downstream. > e.g error msg in client side > {noformat} > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:209) > ~[?:1.8.0_112] > at java.net.SocketInputStream.read(SocketInputStream.java:141) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:675) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) > ~[?:1.8.0_112] > at > org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:460) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:492) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:417) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:215) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:73) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > ~[tez-common-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_112] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] > {noformat} > This corner case handling was not pulled in earlier from MR handler fixes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16692) LLAP: Keep alive connection in shuffle handler should not be closed until entire data is flushed out
[ https://issues.apache.org/jira/browse/HIVE-16692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014517#comment-16014517 ] Siddharth Seth commented on HIVE-16692: --- +1. Thanks [~rajesh.balamohan]. Test failures are not related. > LLAP: Keep alive connection in shuffle handler should not be closed until > entire data is flushed out > > > Key: HIVE-16692 > URL: https://issues.apache.org/jira/browse/HIVE-16692 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-16692.1.patch > > > In corner cases with keep-alive enabled, it is possible that the headers are > written out in the response and downstream was able to read the headers. > But possible that the mapOutput construction took a lot longer time (due to > disk or any other issue) in server side. In the mean time, keep alive timeout > can kick in and close the connection from server side. In such cases, there > is a possibility that downstream can get "connection reset". Ideally keep > alive should kick in only after flushing entire response downstream. > e.g error msg in client side > {noformat} > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:209) > ~[?:1.8.0_112] > at java.net.SocketInputStream.read(SocketInputStream.java:141) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > ~[?:1.8.0_112] > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > ~[?:1.8.0_112] > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:675) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) > ~[?:1.8.0_112] > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) > ~[?:1.8.0_112] > at > org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:460) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:492) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:417) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:215) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:73) > ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at > org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > ~[tez-common-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_112] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_112] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] > {noformat} > This corner case handling was not pulled in earlier from MR handler fixes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-14052) Cleanup of structures required when LLAP access from external clients completes
[ https://issues.apache.org/jira/browse/HIVE-14052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013308#comment-16013308 ] Siddharth Seth commented on HIVE-14052: --- bq. Also just wondering, why does external submission be handled differently, cannot external client make the same calls at Tez AM? Mainly because there's no central system which can come in and inform daemons when a query completes. (Some of this could potentially be done for the regular flow as well). Additional enhancements are required to detect when an AM goes down. This will likely be via ZK at some point. Will need to re-visit this at that point, to see if external clients can use the same process. > Cleanup of structures required when LLAP access from external clients > completes > --- > > Key: HIVE-14052 > URL: https://issues.apache.org/jira/browse/HIVE-14052 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Siddharth Seth > Attachments: HIVE-14052.02.patch, HIVE-14052.1.patch > > > Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP > to track a query will keep building up slowly over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-14052) Cleanup of structures required when LLAP access from external clients completes
[ https://issues.apache.org/jira/browse/HIVE-14052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-14052: - Assignee: Siddharth Seth (was: Jason Dere) > Cleanup of structures required when LLAP access from external clients > completes > --- > > Key: HIVE-14052 > URL: https://issues.apache.org/jira/browse/HIVE-14052 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Siddharth Seth > Attachments: HIVE-14052.02.patch, HIVE-14052.1.patch > > > Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP > to track a query will keep building up slowly over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-14052) Cleanup of structures required when LLAP access from external clients completes
[ https://issues.apache.org/jira/browse/HIVE-14052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14052: -- Status: Patch Available (was: Open) > Cleanup of structures required when LLAP access from external clients > completes > --- > > Key: HIVE-14052 > URL: https://issues.apache.org/jira/browse/HIVE-14052 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-14052.02.patch, HIVE-14052.1.patch > > > Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP > to track a query will keep building up slowly over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-14052) Cleanup of structures required when LLAP access from external clients completes
[ https://issues.apache.org/jira/browse/HIVE-14052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14052: -- Attachment: HIVE-14052.02.patch Attaching an updated patch which does the following. - Avoids local dir creation in case of external services - Attempts to cleanup structures on fragment completion. - Queues some cleanup for a later point. Note: One side affect could be overwritten log files (If new fragments show up after a minute of the last known fragment completing). [~jdere], [~sershe] - could you please take a look. > Cleanup of structures required when LLAP access from external clients > completes > --- > > Key: HIVE-14052 > URL: https://issues.apache.org/jira/browse/HIVE-14052 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-14052.02.patch, HIVE-14052.1.patch > > > Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP > to track a query will keep building up slowly over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16655) LLAP: Avoid preempting fragments before they enter the running state
[ https://issues.apache.org/jira/browse/HIVE-16655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16655: -- Status: Patch Available (was: Open) > LLAP: Avoid preempting fragments before they enter the running state > > > Key: HIVE-16655 > URL: https://issues.apache.org/jira/browse/HIVE-16655 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-16655.01.patch, HIVE-16655.02.patch > > > Currently in the AM, fragments may be preempted as soon as they are > allocated, without knowing whether they will move into the RUNNING state or > not. Leads to a lot of unnecessary kills. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16655) LLAP: Avoid preempting fragments before they enter the running state
[ https://issues.apache.org/jira/browse/HIVE-16655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16655: -- Attachment: HIVE-16655.02.patch Updated patch with the tez dependency changed to run ptests. Have already tested this on a cluster. > LLAP: Avoid preempting fragments before they enter the running state > > > Key: HIVE-16655 > URL: https://issues.apache.org/jira/browse/HIVE-16655 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-16655.01.patch, HIVE-16655.02.patch > > > Currently in the AM, fragments may be preempted as soon as they are > allocated, without knowing whether they will move into the RUNNING state or > not. Leads to a lot of unnecessary kills. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16639) LLAP: Derive shuffle thread counts and keep-alive connections from instance count
[ https://issues.apache.org/jira/browse/HIVE-16639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16639: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > LLAP: Derive shuffle thread counts and keep-alive connections from instance > count > - > > Key: HIVE-16639 > URL: https://issues.apache.org/jira/browse/HIVE-16639 > Project: Hive > Issue Type: Improvement >Reporter: Gopal V >Assignee: Siddharth Seth > Fix For: 3.0.0 > > Attachments: HIVE-16639.01.patch, HIVE-16639.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16639) LLAP: Derive shuffle thread counts and keep-alive connections from instance count
[ https://issues.apache.org/jira/browse/HIVE-16639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008855#comment-16008855 ] Siddharth Seth commented on HIVE-16639: --- Test failures are not related. Verified locally. Committing Thanks for the review [~gopalv] > LLAP: Derive shuffle thread counts and keep-alive connections from instance > count > - > > Key: HIVE-16639 > URL: https://issues.apache.org/jira/browse/HIVE-16639 > Project: Hive > Issue Type: Improvement >Reporter: Gopal V >Assignee: Siddharth Seth > Attachments: HIVE-16639.01.patch, HIVE-16639.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16639) LLAP: Derive shuffle thread counts and keep-alive connections from instance count
[ https://issues.apache.org/jira/browse/HIVE-16639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16639: -- Attachment: HIVE-16639.02.patch Updated patch. > LLAP: Derive shuffle thread counts and keep-alive connections from instance > count > - > > Key: HIVE-16639 > URL: https://issues.apache.org/jira/browse/HIVE-16639 > Project: Hive > Issue Type: Improvement >Reporter: Gopal V >Assignee: Siddharth Seth > Attachments: HIVE-16639.01.patch, HIVE-16639.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16634) LLAP Use a pool of connections to a single AM from a daemon
[ https://issues.apache.org/jira/browse/HIVE-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16634: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > LLAP Use a pool of connections to a single AM from a daemon > --- > > Key: HIVE-16634 > URL: https://issues.apache.org/jira/browse/HIVE-16634 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Siddharth Seth > Fix For: 3.0.0 > > Attachments: HIVE-16634.01.patch, HIVE-16634.02.patch, > locked-threads-ipc.png > > > !locked-threads-ipc.png! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16634) LLAP Use a pool of connections to a single AM from a daemon
[ https://issues.apache.org/jira/browse/HIVE-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008715#comment-16008715 ] Siddharth Seth commented on HIVE-16634: --- Committing the .1 patch. I don't think the .2 patch test failures are related, but I don't like the patch too much. Will create a new jira to fix the pool size properly. Thanks for the review [~sershe] > LLAP Use a pool of connections to a single AM from a daemon > --- > > Key: HIVE-16634 > URL: https://issues.apache.org/jira/browse/HIVE-16634 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Siddharth Seth > Attachments: HIVE-16634.01.patch, HIVE-16634.02.patch, > locked-threads-ipc.png > > > !locked-threads-ipc.png! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16652) LlapInputFormat: Seeing "output error" WARN message
[ https://issues.apache.org/jira/browse/HIVE-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008711#comment-16008711 ] Siddharth Seth commented on HIVE-16652: --- +1. Looks good. > LlapInputFormat: Seeing "output error" WARN message > --- > > Key: HIVE-16652 > URL: https://issues.apache.org/jira/browse/HIVE-16652 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16652.1.patch > > > Another warning message I'm seeing in the logs for TestJdbcWithMiniLlap after > adding the line to close the RecordReader in the test: > {noformat} > 2017-05-11T11:08:34,511 WARN [IPC Server handler 0 on 54847] ipc.Server: IPC > Server handler 0 on 54847, call Call#341 Retry#0 heartbeat({ > containerId=container_6830411502416918223_0003_00_00, requestId=2, > startIndex=0, preRoutedStartIndex=0, maxEventsToGet=500, > taskAttemptId=attempt_6830411502416918223_0003_0_00_00_0, eventCount=2 > }), rpc version=2, client version=1, methodsFingerPrint=996603002 from > 10.22.8.180:54849: output error > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16651) LlapProtocolClientProxy stack trace when using llap input format
[ https://issues.apache.org/jira/browse/HIVE-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008700#comment-16008700 ] Siddharth Seth commented on HIVE-16651: --- +1 > LlapProtocolClientProxy stack trace when using llap input format > > > Key: HIVE-16651 > URL: https://issues.apache.org/jira/browse/HIVE-16651 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16651.1.patch > > > Seeing this after LlapBaseRecordReader.close(): > {noformat} > 16/06/28 22:05:32 WARN LlapProtocolClientProxy: RequestManager shutdown with > error > java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) > at com.google.common.util.concurrent.Futures$4.run(Futures.java:1170) > at > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > at > com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91) > at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384) > at java.util.concurrent.FutureTask.cancel(FutureTask.java:180) > at > org.apache.hadoop.hive.llap.tez.LlapProtocolClientProxy.serviceStop(LlapProtocolClientProxy.java:131) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient.serviceStop(LlapTaskUmbilicalExternalClient.java:135) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:250) > at > org.apache.hadoop.hive.llap.LlapBaseRecordReader.close(LlapBaseRecordReader.java:84) > at > org.apache.hadoop.hive.llap.LlapRowRecordReader.close(LlapRowRecordReader.java:80) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16655) LLAP: Avoid preempting fragments before they enter the running state
[ https://issues.apache.org/jira/browse/HIVE-16655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16655: -- Attachment: HIVE-16655.01.patch The patch. Relies on the start notifications from Tez, to detemine the state of a fragment, and preempt accordingly. cc [~prasanth_j], [~sershe] for review. Cannot submit to jenkins since the Tez patch introduces new APIs. > LLAP: Avoid preempting fragments before they enter the running state > > > Key: HIVE-16655 > URL: https://issues.apache.org/jira/browse/HIVE-16655 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-16655.01.patch > > > Currently in the AM, fragments may be preempted as soon as they are > allocated, without knowing whether they will move into the RUNNING state or > not. Leads to a lot of unnecessary kills. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16655) LLAP: Avoid preempting fragments before they enter the running state
[ https://issues.apache.org/jira/browse/HIVE-16655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-16655: - > LLAP: Avoid preempting fragments before they enter the running state > > > Key: HIVE-16655 > URL: https://issues.apache.org/jira/browse/HIVE-16655 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > > Currently in the AM, fragments may be preempted as soon as they are > allocated, without knowing whether they will move into the RUNNING state or > not. Leads to a lot of unnecessary kills. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16639) LLAP: Derive shuffle thread counts and keep-alive connections from instance count
[ https://issues.apache.org/jira/browse/HIVE-16639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16639: -- Status: Patch Available (was: Open) > LLAP: Derive shuffle thread counts and keep-alive connections from instance > count > - > > Key: HIVE-16639 > URL: https://issues.apache.org/jira/browse/HIVE-16639 > Project: Hive > Issue Type: Improvement >Reporter: Gopal V >Assignee: Siddharth Seth > Attachments: HIVE-16639.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16639) LLAP: Derive shuffle thread counts and keep-alive connections from instance count
[ https://issues.apache.org/jira/browse/HIVE-16639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16639: -- Attachment: HIVE-16639.01.patch Enables keep-alive by default. Thread count to 3x cpu count. -Dhttp.maxconnections set to numInstances + 1 [~gopalv], [~rajesh.balamohan] - could you please take a look. > LLAP: Derive shuffle thread counts and keep-alive connections from instance > count > - > > Key: HIVE-16639 > URL: https://issues.apache.org/jira/browse/HIVE-16639 > Project: Hive > Issue Type: Improvement >Reporter: Gopal V >Assignee: Siddharth Seth > Attachments: HIVE-16639.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16639) LLAP: Derive shuffle thread counts and keep-alive connections from instance count
[ https://issues.apache.org/jira/browse/HIVE-16639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-16639: - Assignee: Siddharth Seth > LLAP: Derive shuffle thread counts and keep-alive connections from instance > count > - > > Key: HIVE-16639 > URL: https://issues.apache.org/jira/browse/HIVE-16639 > Project: Hive > Issue Type: Improvement >Reporter: Gopal V >Assignee: Siddharth Seth > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16634) LLAP Use a pool of connections to a single AM from a daemon
[ https://issues.apache.org/jira/browse/HIVE-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16634: -- Attachment: HIVE-16634.02.patch Updated to add a cap. One downside is that it will always created the configured instances. Don't think seeding with a single instance helps a lot. > LLAP Use a pool of connections to a single AM from a daemon > --- > > Key: HIVE-16634 > URL: https://issues.apache.org/jira/browse/HIVE-16634 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Siddharth Seth > Attachments: HIVE-16634.01.patch, HIVE-16634.02.patch, > locked-threads-ipc.png > > > !locked-threads-ipc.png! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16634) LLAP Use a pool of connections to a single AM from a daemon
[ https://issues.apache.org/jira/browse/HIVE-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16634: -- Reporter: Rajesh Balamohan (was: Siddharth Seth) > LLAP Use a pool of connections to a single AM from a daemon > --- > > Key: HIVE-16634 > URL: https://issues.apache.org/jira/browse/HIVE-16634 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Siddharth Seth > Attachments: HIVE-16634.01.patch, locked-threads-ipc.png > > > !locked-threads-ipc.png! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16635) Progressbar: Use a step-function for CHECK_INTERVAL timeouts
[ https://issues.apache.org/jira/browse/HIVE-16635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005551#comment-16005551 ] Siddharth Seth commented on HIVE-16635: --- +1. Couple of minor nits. - 2.5s is pretty high, and will affect usability of the UI. Can this be lowered, maybe in combination with TEZ-3719. - {code}final int maxFailures = (MAX_RETRY_INTERVAL/MAX_CHECK_INTERVAL)+1;{code} This may as well be a private static final. > Progressbar: Use a step-function for CHECK_INTERVAL timeouts > > > Key: HIVE-16635 > URL: https://issues.apache.org/jira/browse/HIVE-16635 > Project: Hive > Issue Type: Improvement > Components: Tez >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-16635.1.patch > > > Fewer getProgress() calls can speed up Tez by ~20% - see TEZ-3719 for other > half of this improvement. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16634) LLAP Use a pool of connections to a single AM from a daemon
[ https://issues.apache.org/jira/browse/HIVE-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16634: -- Status: Patch Available (was: Open) > LLAP Use a pool of connections to a single AM from a daemon > --- > > Key: HIVE-16634 > URL: https://issues.apache.org/jira/browse/HIVE-16634 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-16634.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16634) LLAP Use a pool of connections to a single AM from a daemon
[ https://issues.apache.org/jira/browse/HIVE-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16634: -- Attachment: HIVE-16634.01.patch Patch to fix this. cc [~sershe] for review. > LLAP Use a pool of connections to a single AM from a daemon > --- > > Key: HIVE-16634 > URL: https://issues.apache.org/jira/browse/HIVE-16634 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-16634.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16634) LLAP Use a pool of connections to a single AM from a daemon
[ https://issues.apache.org/jira/browse/HIVE-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-16634: - > LLAP Use a pool of connections to a single AM from a daemon > --- > > Key: HIVE-16634 > URL: https://issues.apache.org/jira/browse/HIVE-16634 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Siddharth Seth >Assignee: Siddharth Seth > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
[ https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001923#comment-16001923 ] Siddharth Seth commented on HIVE-16343: --- +1. I'd still test the smap approach for perf. > LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring > > > Key: HIVE-16343 > URL: https://issues.apache.org/jira/browse/HIVE-16343 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16343.1.patch, HIVE-16343.2.patch > > > Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful > for monitoring and also setting up triggers via JMC. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
[ https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981707#comment-15981707 ] Siddharth Seth commented on HIVE-16343: --- This look up can be quite expensive. e.g. the SMAPS based lookup can take multiple seconds. I don't think refreshing it every 10s is a good idea. Need to have some kind of guard around when it gets refreshed (independent of the metrics config) > LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring > > > Key: HIVE-16343 > URL: https://issues.apache.org/jira/browse/HIVE-16343 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16343.1.patch > > > Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful > for monitoring and also setting up triggers via JMC. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
[ https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981608#comment-15981608 ] Siddharth Seth commented on HIVE-16343: --- Getting access to the PID. Is there an easier and more reliable way to do this, instead of relying on a pid file. Tez/YARN use the following - While launching the process. environment.put("JVM_PID", "$$") / export. Within the process - System.getenv().get("JVM_PID"). If retaining the current method of accessing the pid file, please move to a helper class. The daemon class is getting a little noisy. May want to introduce a config for which process monitor to use, instead of relying on a YARN configuration. How often will the metrics be collected? > LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring > > > Key: HIVE-16343 > URL: https://issues.apache.org/jira/browse/HIVE-16343 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16343.1.patch > > > Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful > for monitoring and also setting up triggers via JMC. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15786) Provide additional information from the llapstatus command
[ https://issues.apache.org/jira/browse/HIVE-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-15786: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > Provide additional information from the llapstatus command > -- > > Key: HIVE-15786 > URL: https://issues.apache.org/jira/browse/HIVE-15786 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-15786.01.patch, HIVE-15786.03.patch, > HIVE-15786.04.patch, HIVE-15786.05.patch > > > Slider is making enhancements to provide additional information like > completed containers, pending containers etc. > Integrate with this to provide additional details in llapstatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15786) Provide additional information from the llapstatus command
[ https://issues.apache.org/jira/browse/HIVE-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977797#comment-15977797 ] Siddharth Seth commented on HIVE-15786: --- Thanks for the review. Committed. [~owen.omalley] - I'd like to get this into the 2.2 release as well. It could not be committed earlier because a slider release was not available. > Provide additional information from the llapstatus command > -- > > Key: HIVE-15786 > URL: https://issues.apache.org/jira/browse/HIVE-15786 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Attachments: HIVE-15786.01.patch, HIVE-15786.03.patch, > HIVE-15786.04.patch, HIVE-15786.05.patch > > > Slider is making enhancements to provide additional information like > completed containers, pending containers etc. > Integrate with this to provide additional details in llapstatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15786) Provide additional information from the llapstatus command
[ https://issues.apache.org/jira/browse/HIVE-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975578#comment-15975578 ] Siddharth Seth commented on HIVE-15786: --- [~prasanth_j] - could you take another look please. THis is the old patch with a fixed dependency version. > Provide additional information from the llapstatus command > -- > > Key: HIVE-15786 > URL: https://issues.apache.org/jira/browse/HIVE-15786 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Attachments: HIVE-15786.01.patch, HIVE-15786.03.patch, > HIVE-15786.04.patch, HIVE-15786.05.patch > > > Slider is making enhancements to provide additional information like > completed containers, pending containers etc. > Integrate with this to provide additional details in llapstatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16461) DagUtils checks local resource size on the remote fs
[ https://issues.apache.org/jira/browse/HIVE-16461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974210#comment-15974210 ] Siddharth Seth commented on HIVE-16461: --- +1 for the patch. There's sections of that code, which I suspect, get invoked incorrectly. e.g. Invoking copyFromLocal APIs on files where the actual filesystem has been specified. > DagUtils checks local resource size on the remote fs > > > Key: HIVE-16461 > URL: https://issues.apache.org/jira/browse/HIVE-16461 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-16461.patch > > > The path for local file may have no schema. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15786) Provide additional information from the llapstatus command
[ https://issues.apache.org/jira/browse/HIVE-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-15786: -- Attachment: HIVE-15786.05.patch Updated patch to use the released version of slider. > Provide additional information from the llapstatus command > -- > > Key: HIVE-15786 > URL: https://issues.apache.org/jira/browse/HIVE-15786 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Attachments: HIVE-15786.01.patch, HIVE-15786.03.patch, > HIVE-15786.04.patch, HIVE-15786.05.patch > > > Slider is making enhancements to provide additional information like > completed containers, pending containers etc. > Integrate with this to provide additional details in llapstatus. -- This message was sent by Atlassian JIRA (v6.3.15#6346)