[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299079#comment-16299079 ] Thejas M Nair commented on HIVE-17853: -- Created HIVE-18322. [~cdrome] Do you think you will be able to take a stab at it ? > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Fix For: 3.0.0, 2.4.0, 2.2.1 > > Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299065#comment-16299065 ] Thejas M Nair commented on HIVE-17853: -- We should also check to see if current user is same as the original UGI user, and not do the ugi.doAs() if it is the same. Otherwise, this can potentially cause problems where the users are not privileged users (ie, there is no intent to do a "doAs"). You would get errors like " userX is not allowed to impersonate userX". > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Fix For: 3.0.0, 2.4.0, 2.2.1 > > Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275046#comment-16275046 ] Mithun Radhakrishnan commented on HIVE-17853: - Checked into {{master}}, {{branch-2}}, and {{branch-2.2}}. Thank you for the fix, [~cdrome]! And thank you, [~vihangk1], for taking the time to review. > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Fix For: 3.0.0, 2.4.0, 2.2.1 > > Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275017#comment-16275017 ] Mithun Radhakrishnan commented on HIVE-17853: - I'll check this version of the fix in. I have raised HIVE-18199 to track the addition of the unit-test. > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234426#comment-16234426 ] Chris Drome commented on HIVE-17853: I'm part way through a possible unittest now. Let's see how that goes. Unittests have not run for any of the other candidate branches yet. Will try to coerce builds to run. > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234338#comment-16234338 ] Vihang Karajgaonkar commented on HIVE-17853: bq. Right. HS2 doesn't come into it, since this has more to do with HCatClient. The HCatalog APIs use HiveClientCache to amortize the cost of HiveMetaStoreClient construction and metastore connections. Systems like Oozie/Falcon that use HCatClient to make metastore-calls within a doAs() context might land up losing their UGI.doAs() contexts after timeout, causing any retried actions to run as a privileged, rather than the impersonated user. Thanks [~mithun] for the clarification. I am fine with adding unit-tests in a followup JIRA. May be we should also refactor the reconnect using UGI logic in a separate method? the {{invoke}} method is already pretty unwieldy will the different try catch clauses. Rest looks fine to me. +1 > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234314#comment-16234314 ] Mithun Radhakrishnan commented on HIVE-17853: - [~vihangk1], would you be averse to our adding a unit-test in a followup JIRA? > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233534#comment-16233534 ] Mithun Radhakrishnan commented on HIVE-17853: - For the record, we've tested this out manually, using an Oozie setup, with user-impersonation. I suppose it might be possible to set up a unit-test using something based on {{MiniHiveKdc}}, but it looks non-trivial. Hmm... > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227401#comment-16227401 ] Mithun Radhakrishnan commented on HIVE-17853: - [~vihangk1], bq. All subsequent actions coming via HS2 should also do a doAs() using the sessionProxy. Is this happening in case of HCatalog... Right. HS2 doesn't come into it, since this has more to do with {{HCatClient}}. The HCatalog APIs use {{HiveClientCache}} to amortize the cost of {{HiveMetaStoreClient}} construction and metastore connections. Systems like Oozie/Falcon that use {{HCatClient}} to make metastore-calls within a {{doAs()}} context might land up losing their {{UGI.doAs()}} contexts after timeout, causing any retried actions to run as a privileged, rather than the impersonated user. > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Attachments: HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227311#comment-16227311 ] Chris Drome commented on HIVE-17853: [~vihangk1], as per the description, consider the case of Oozie {{oozie}} impersonating a different user {{mithun}}. The {{oozie}} user will create a client and open the connection to the metastore within the doAs clause, which means that all operations during this session are performed as {{mithun}}. A retry/reconnect can occur if the read timeout for an operation is exceeded or the lifetime of the connection is exceeded. At this point, {{close}} is called explicitly, followed by a call to {{open}} to establish a new connection. However, the reconnect call is not being performed in a doAs context, so it will create a new connection to the metastore as {{oozie}}. There is no specific stack trace to attach here as it depends on the operations executed after the reconnect, and typically manifests as a failure caused by insufficient privileges. Worst case, if {{oozie}} has more privileges than {{mithun}}, it will successfully perform operations that {{mithun}} is not allowed to perform. According to the API, fetching the UserGroupInformation object can throw an IOException. I'm not familiar with the cases under which this would occur. However, I didn't want to fail immediately, because if the connection was initially established within a doAs, the calling code should have been able to establish a proper identity. So I let as much work get accomplished until the reconnect fails, which shouldn't be a problem, because most metastore sessions are not long-lived. > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Attachments: HIVE-17853.01-branch-2.2.patch, > HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227126#comment-16227126 ] Vihang Karajgaonkar commented on HIVE-17853: Thanks for the patch [~cdrome]. I am curious to understand why this happens. When impersonation is turned on the HMSClient is instantiated from the HiveSessionImplWithUGI and it does a doAs() when instantiating the client. All subsequent actions coming via HS2 should also do a doAs() using the sessionProxy. Is this happening in case of HCatalog (I am not very familiar with HCatalog)? It would be great if you provide some call-stack or example of when this happens. Thanks! Took a quick look at the patch. Couple questions. if (this.ugi == null) { 78LOG.warn("RetryingMetaStoreClient unable to determine current user UGI."); 79 } Will ugi ever be null? If not, may be this check is redundant. Can you also add a test case if possible? Thanks! > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Attachments: HIVE-17853.01-branch-2.2.patch, > HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17853) RetryingMetaStoreClient loses UGI impersonation-context when reconnecting after timeout
[ https://issues.apache.org/jira/browse/HIVE-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226624#comment-16226624 ] Hive QA commented on HIVE-17853: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894882/HIVE-17853.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11342 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=155) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes (batchId=229) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7572/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7572/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7572/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894882 - PreCommit-HIVE-Build > RetryingMetaStoreClient loses UGI impersonation-context when reconnecting > after timeout > --- > > Key: HIVE-17853 > URL: https://issues.apache.org/jira/browse/HIVE-17853 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0, 2.4.0, 2.2.1 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome >Priority: Critical > Attachments: HIVE-17853.01-branch-2.2.patch, > HIVE-17853.01-branch-2.patch, HIVE-17853.01.patch > > > The {{RetryingMetaStoreClient}} is used to automatically reconnect to the > Hive metastore, after client timeout, transparently to the user. > In case of user impersonation (e.g. Oozie super-user {{oozie}} impersonating > a Hadoop user {{mithun}}, to run a workflow), in case of timeout, we find > that the reconnect causes the {{UGI.doAs()}} context to be lost. Any further > metastore operations will be attempted as the login-user ({{oozie}}), as > opposed to the effective user ({{mithun}}). > We should have a fix for this shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)