[jira] [Resolved] (AMBARI-22918) Decommission RegionServer fails when kerberos is enabled
[ https://issues.apache.org/jira/browse/AMBARI-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toshihiro Suzuki resolved AMBARI-22918. --- Resolution: Fixed > Decommission RegionServer fails when kerberos is enabled > > > Key: AMBARI-22918 > URL: https://issues.apache.org/jira/browse/AMBARI-22918 > Project: Ambari > Issue Type: Bug > Components: ambari-server >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0, 2.6.2 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When kerberos is enabled, Decommission RegionServer fails with the following > errors: > stderr: > {code:java} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 114, in > HbaseMaster().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 329, in execute > method(env) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 55, in decommission > hbase_decommission(env) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", > line 84, in hbase_decommission > logoutput=True > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 166, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", > line 262, in action_run > tries=self.resource.tries, try_sleep=self.resource.try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 72, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 102, in checked_call > tries=tries, try_sleep=try_sleep, > timeout_kill_strategy=timeout_kill_strategy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 150, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 303, in _call > raise ExecutionFailed(err_msg, code, out, err) > resource_management.core.exceptions.ExecutionFailed: Execution of > '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab > hbase/mast...@example.com; /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1' returned 1. Error: Could not find or load main class > org.jruby.Main{code} > stdout: > {code:java} > 2018-02-06 07:25:03,453 - Stack Feature Version Info: Cluster Stack=2.6, > Cluster Current Version=2.6.2.0-205, Command Stack=None, Command > Version=2.6.2.0-205 -> 2.6.2.0-205 > 2018-02-06 07:25:03,476 - Using hadoop conf dir: > /usr/hdp/current/hadoop-client/conf > 2018-02-06 07:25:03,484 - checked_call['hostid'] {} > 2018-02-06 07:25:03,490 - checked_call returned (0, '1aacc56c') > 2018-02-06 07:25:03,502 - > File['/usr/hdp/current/hbase-master/bin/draining_servers.rb'] {'content': > StaticFile('draining_servers.rb'), 'mode': 0755} > 2018-02-06 07:25:03,504 - Execute['/usr/bin/kinit -kt > /etc/security/keytabs/hbase.service.keytab hbase/mast...@example.com; > /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1'] {'logoutput': True, 'user': 'hbase'} > Error: Could not find or load main class org.jruby.Main > Command failed after 1 tries{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AMBARI-22918) Decommission RegionServer fails when kerberos is enabled
[ https://issues.apache.org/jira/browse/AMBARI-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toshihiro Suzuki updated AMBARI-22918: -- Fix Version/s: 2.6.2 3.0.0 > Decommission RegionServer fails when kerberos is enabled > > > Key: AMBARI-22918 > URL: https://issues.apache.org/jira/browse/AMBARI-22918 > Project: Ambari > Issue Type: Bug > Components: ambari-server >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0, 2.6.2 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When kerberos is enabled, Decommission RegionServer fails with the following > errors: > stderr: > {code:java} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 114, in > HbaseMaster().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 329, in execute > method(env) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 55, in decommission > hbase_decommission(env) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", > line 84, in hbase_decommission > logoutput=True > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 166, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", > line 262, in action_run > tries=self.resource.tries, try_sleep=self.resource.try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 72, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 102, in checked_call > tries=tries, try_sleep=try_sleep, > timeout_kill_strategy=timeout_kill_strategy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 150, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 303, in _call > raise ExecutionFailed(err_msg, code, out, err) > resource_management.core.exceptions.ExecutionFailed: Execution of > '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab > hbase/mast...@example.com; /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1' returned 1. Error: Could not find or load main class > org.jruby.Main{code} > stdout: > {code:java} > 2018-02-06 07:25:03,453 - Stack Feature Version Info: Cluster Stack=2.6, > Cluster Current Version=2.6.2.0-205, Command Stack=None, Command > Version=2.6.2.0-205 -> 2.6.2.0-205 > 2018-02-06 07:25:03,476 - Using hadoop conf dir: > /usr/hdp/current/hadoop-client/conf > 2018-02-06 07:25:03,484 - checked_call['hostid'] {} > 2018-02-06 07:25:03,490 - checked_call returned (0, '1aacc56c') > 2018-02-06 07:25:03,502 - > File['/usr/hdp/current/hbase-master/bin/draining_servers.rb'] {'content': > StaticFile('draining_servers.rb'), 'mode': 0755} > 2018-02-06 07:25:03,504 - Execute['/usr/bin/kinit -kt > /etc/security/keytabs/hbase.service.keytab hbase/mast...@example.com; > /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1'] {'logoutput': True, 'user': 'hbase'} > Error: Could not find or load main class org.jruby.Main > Command failed after 1 tries{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMBARI-22918) Decommission RegionServer fails when kerberos is enabled
[ https://issues.apache.org/jira/browse/AMBARI-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370852#comment-16370852 ] Toshihiro Suzuki commented on AMBARI-22918: --- It looks like the patches were merged and this issue was resolved. Can anyone please mark this Jira as Resolved and assign to me? Thanks. > Decommission RegionServer fails when kerberos is enabled > > > Key: AMBARI-22918 > URL: https://issues.apache.org/jira/browse/AMBARI-22918 > Project: Ambari > Issue Type: Bug > Components: ambari-server >Reporter: Toshihiro Suzuki >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > When kerberos is enabled, Decommission RegionServer fails with the following > errors: > stderr: > {code:java} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 114, in > HbaseMaster().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 329, in execute > method(env) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 55, in decommission > hbase_decommission(env) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", > line 84, in hbase_decommission > logoutput=True > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 166, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", > line 262, in action_run > tries=self.resource.tries, try_sleep=self.resource.try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 72, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 102, in checked_call > tries=tries, try_sleep=try_sleep, > timeout_kill_strategy=timeout_kill_strategy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 150, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 303, in _call > raise ExecutionFailed(err_msg, code, out, err) > resource_management.core.exceptions.ExecutionFailed: Execution of > '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab > hbase/mast...@example.com; /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1' returned 1. Error: Could not find or load main class > org.jruby.Main{code} > stdout: > {code:java} > 2018-02-06 07:25:03,453 - Stack Feature Version Info: Cluster Stack=2.6, > Cluster Current Version=2.6.2.0-205, Command Stack=None, Command > Version=2.6.2.0-205 -> 2.6.2.0-205 > 2018-02-06 07:25:03,476 - Using hadoop conf dir: > /usr/hdp/current/hadoop-client/conf > 2018-02-06 07:25:03,484 - checked_call['hostid'] {} > 2018-02-06 07:25:03,490 - checked_call returned (0, '1aacc56c') > 2018-02-06 07:25:03,502 - > File['/usr/hdp/current/hbase-master/bin/draining_servers.rb'] {'content': > StaticFile('draining_servers.rb'), 'mode': 0755} > 2018-02-06 07:25:03,504 - Execute['/usr/bin/kinit -kt > /etc/security/keytabs/hbase.service.keytab hbase/mast...@example.com; > /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1'] {'logoutput': True, 'user': 'hbase'} > Error: Could not find or load main class org.jruby.Main > Command failed after 1 tries{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMBARI-22918) Decommission RegionServer fails when kerberos is enabled
[ https://issues.apache.org/jira/browse/AMBARI-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365089#comment-16365089 ] Toshihiro Suzuki commented on AMBARI-22918: --- [~rlevas] I just sent PRs for trunk and branch-2.6 to fix the unit test failure. > Decommission RegionServer fails when kerberos is enabled > > > Key: AMBARI-22918 > URL: https://issues.apache.org/jira/browse/AMBARI-22918 > Project: Ambari > Issue Type: Bug > Components: ambari-server >Reporter: Toshihiro Suzuki >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > When kerberos is enabled, Decommission RegionServer fails with the following > errors: > stderr: > {code:java} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 114, in > HbaseMaster().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 329, in execute > method(env) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 55, in decommission > hbase_decommission(env) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", > line 84, in hbase_decommission > logoutput=True > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 166, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", > line 262, in action_run > tries=self.resource.tries, try_sleep=self.resource.try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 72, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 102, in checked_call > tries=tries, try_sleep=try_sleep, > timeout_kill_strategy=timeout_kill_strategy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 150, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 303, in _call > raise ExecutionFailed(err_msg, code, out, err) > resource_management.core.exceptions.ExecutionFailed: Execution of > '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab > hbase/mast...@example.com; /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1' returned 1. Error: Could not find or load main class > org.jruby.Main{code} > stdout: > {code:java} > 2018-02-06 07:25:03,453 - Stack Feature Version Info: Cluster Stack=2.6, > Cluster Current Version=2.6.2.0-205, Command Stack=None, Command > Version=2.6.2.0-205 -> 2.6.2.0-205 > 2018-02-06 07:25:03,476 - Using hadoop conf dir: > /usr/hdp/current/hadoop-client/conf > 2018-02-06 07:25:03,484 - checked_call['hostid'] {} > 2018-02-06 07:25:03,490 - checked_call returned (0, '1aacc56c') > 2018-02-06 07:25:03,502 - > File['/usr/hdp/current/hbase-master/bin/draining_servers.rb'] {'content': > StaticFile('draining_servers.rb'), 'mode': 0755} > 2018-02-06 07:25:03,504 - Execute['/usr/bin/kinit -kt > /etc/security/keytabs/hbase.service.keytab hbase/mast...@example.com; > /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1'] {'logoutput': True, 'user': 'hbase'} > Error: Could not find or load main class org.jruby.Main > Command failed after 1 tries{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMBARI-22918) Decommission RegionServer fails when kerberos is enabled
[ https://issues.apache.org/jira/browse/AMBARI-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357873#comment-16357873 ] Toshihiro Suzuki commented on AMBARI-22918: --- Thanks [~elserj]. So it seems to me that the patch is good. Will wait for the patch to be merged. > Decommission RegionServer fails when kerberos is enabled > > > Key: AMBARI-22918 > URL: https://issues.apache.org/jira/browse/AMBARI-22918 > Project: Ambari > Issue Type: Bug > Components: ambari-server >Reporter: Toshihiro Suzuki >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > When kerberos is enabled, Decommission RegionServer fails with the following > errors: > stderr: > {code:java} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 114, in > HbaseMaster().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 329, in execute > method(env) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 55, in decommission > hbase_decommission(env) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", > line 84, in hbase_decommission > logoutput=True > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 166, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", > line 262, in action_run > tries=self.resource.tries, try_sleep=self.resource.try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 72, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 102, in checked_call > tries=tries, try_sleep=try_sleep, > timeout_kill_strategy=timeout_kill_strategy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 150, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 303, in _call > raise ExecutionFailed(err_msg, code, out, err) > resource_management.core.exceptions.ExecutionFailed: Execution of > '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab > hbase/mast...@example.com; /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1' returned 1. Error: Could not find or load main class > org.jruby.Main{code} > stdout: > {code:java} > 2018-02-06 07:25:03,453 - Stack Feature Version Info: Cluster Stack=2.6, > Cluster Current Version=2.6.2.0-205, Command Stack=None, Command > Version=2.6.2.0-205 -> 2.6.2.0-205 > 2018-02-06 07:25:03,476 - Using hadoop conf dir: > /usr/hdp/current/hadoop-client/conf > 2018-02-06 07:25:03,484 - checked_call['hostid'] {} > 2018-02-06 07:25:03,490 - checked_call returned (0, '1aacc56c') > 2018-02-06 07:25:03,502 - > File['/usr/hdp/current/hbase-master/bin/draining_servers.rb'] {'content': > StaticFile('draining_servers.rb'), 'mode': 0755} > 2018-02-06 07:25:03,504 - Execute['/usr/bin/kinit -kt > /etc/security/keytabs/hbase.service.keytab hbase/mast...@example.com; > /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1'] {'logoutput': True, 'user': 'hbase'} > Error: Could not find or load main class org.jruby.Main > Command failed after 1 tries{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMBARI-22918) Decommission RegionServer fails when kerberos is enabled
[ https://issues.apache.org/jira/browse/AMBARI-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356315#comment-16356315 ] Toshihiro Suzuki commented on AMBARI-22918: --- {quote} Also, it seems like we don't need even hbase_client_jaas.conf. Even if I removed "java.security.auth.login.config" from command line (hbase_master_jaas.conf) and hbase-evn.sh (hbase_client_jaas.conf), the command worked. Therefore, it looks like we don't need to set any jaas file in this case. {quote} Sorry, I was wrong. I tried again, and it looks like we need either hbase_client_jaas.conf or hbase_master_jaas.conf to run the command successfully. In that case, I feel it's better to specify {master_security_config} in hbase_decommission.py, because even if a user removes hbase_client_jaas.conf from hbase-evn.sh, decommission will be successful. What do you think? [~elserj] [~rguruvannagari] > Decommission RegionServer fails when kerberos is enabled > > > Key: AMBARI-22918 > URL: https://issues.apache.org/jira/browse/AMBARI-22918 > Project: Ambari > Issue Type: Bug > Components: ambari-server >Reporter: Toshihiro Suzuki >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > When kerberos is enabled, Decommission RegionServer fails with the following > errors: > stderr: > {code:java} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 114, in > HbaseMaster().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 329, in execute > method(env) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 55, in decommission > hbase_decommission(env) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", > line 84, in hbase_decommission > logoutput=True > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 166, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", > line 262, in action_run > tries=self.resource.tries, try_sleep=self.resource.try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 72, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 102, in checked_call > tries=tries, try_sleep=try_sleep, > timeout_kill_strategy=timeout_kill_strategy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 150, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 303, in _call > raise ExecutionFailed(err_msg, code, out, err) > resource_management.core.exceptions.ExecutionFailed: Execution of > '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab > hbase/mast...@example.com; /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1' returned 1. Error: Could not find or load main class > org.jruby.Main{code} > stdout: > {code:java} > 2018-02-06 07:25:03,453 - Stack Feature Version Info: Cluster Stack=2.6, > Cluster Current Version=2.6.2.0-205, Command Stack=None, Command > Version=2.6.2.0-205 -> 2.6.2.0-205 > 2018-02-06 07:25:03,476 - Using hadoop conf dir: > /usr/hdp/current/hadoop-client/conf > 2018-02-06 07:25:03,484 - checked_call['hostid'] {} > 2018-02-06 07:25:03,490 - checked_call returned (0, '1aacc56c') > 2018-02-06 07:25:03,502 - > File['/usr/hdp/current/hbase-master/bin/draining_servers.rb'] {'content': > StaticFile('draining_servers.rb'), 'mode': 0755} > 2018-02-06 07:25:03,504 - Execute['/usr/bin/kinit -kt > /etc/security/keytabs/hbase.service.keytab hbase/mast...@example.com; > /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-
[jira] [Commented] (AMBARI-22918) Decommission RegionServer fails when kerberos is enabled
[ https://issues.apache.org/jira/browse/AMBARI-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355517#comment-16355517 ] Toshihiro Suzuki commented on AMBARI-22918: --- When I ran the command in my previous comment, the ps output was as follows: {code:java} root 2891 154 1.4 3226792 87852 pts/0 Sl+ 09:40 0:01 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_org.jruby.Main -XX:OnOutOfMemoryError=kill -9 %p -Dhdp.version=2.6.2.0-205 -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hbase/hs_err_pid%p.log -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_client_jaas.conf -Djava.io.tmpdir=/tmp -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase.log -Dhbase.home.dir=/usr/hdp/2.6.2.0-205/hbase -Dhbase.id.str= -Dhbase.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.6.2.0-205/hadoop/lib/native/Linux-amd64-64:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/hdp/2.6.2.0-205/hadoop/lib/native -Dhbase.security.logger=INFO,NullAppender org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add worker1 {code} It seems like 2 "java.security.auth.login.config" were set. The following is set in the command line. {code:java} -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf {code} The following is set in hbase-env.sh {code:java} -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_client_jaas.conf {code} And as [~rguruvannagari] mentioned, it seems like JVM uses the second one (at least it did in my env). I thought when we run draining_servers.rb (and region_mover.rb), we need hbase_master_jaas.conf, but it seems like we don't need it. Also, it seems like we don't need even hbase_client_jaas.conf. Even if I removed "java.security.auth.login.config" from command line (hbase_master_jaas.conf) and hbase-evn.sh (hbase_client_jaas.conf), the command worked. Therefore, it looks like we don't need to set any jaas file in this case. And regarding a fix for this issue, we can only remove "\{master_security_config}" as [~rguruvannagari] suggested: {code} 66 "{kinit_cmd} {hbase_cmd} --config {hbase_conf_dir} org.jruby.Main {region_drainer} remove {host}") 78 "{kinit_cmd} {hbase_cmd} --config {hbase_conf_dir} org.jruby.Main {region_drainer} add {host}") 80 "{kinit_cmd} {hbase_cmd} --config {hbase_conf_dir} org.jruby.Main {region_mover} unload {host}") {code} Any objections? > Decommission RegionServer fails when kerberos is enabled > > > Key: AMBARI-22918 > URL: https://issues.apache.org/jira/browse/AMBARI-22918 > Project: Ambari > Issue Type: Bug > Components: ambari-server >Reporter: Toshihiro Suzuki >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > When kerberos is enabled, Decommission RegionServer fails with the following > errors: > stderr: > {code:java} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 114, in > HbaseMaster().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 329, in execute > method(env) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 55, in decommission > hbase_decommission(env) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", > line 84, in hbase_decommission > logoutput=True > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 166, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", > line 262, in action_run > tries=self.resource.tries, try_sleep=self.resource.try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 72, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 102, in checked_call > tries=tries, try_sleep=try_sleep, > timeout_kill_strategy=timeout_kill_strategy) > File "/usr/lib/pyt
[jira] [Commented] (AMBARI-22918) Decommission RegionServer fails when kerberos is enabled
[ https://issues.apache.org/jira/browse/AMBARI-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355136#comment-16355136 ] Toshihiro Suzuki commented on AMBARI-22918: --- [~rguruvannagari] How did you recreated the issue? When I ran the following command manually, the issue was not recreated after applying the patch: {code:java} HBASE_OPTS="$HBASE_OPTS -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf" /usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add worker1{code} > Decommission RegionServer fails when kerberos is enabled > > > Key: AMBARI-22918 > URL: https://issues.apache.org/jira/browse/AMBARI-22918 > Project: Ambari > Issue Type: Bug > Components: ambari-server >Reporter: Toshihiro Suzuki >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > When kerberos is enabled, Decommission RegionServer fails with the following > errors: > stderr: > {code:java} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 114, in > HbaseMaster().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 329, in execute > method(env) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", > line 55, in decommission > hbase_decommission(env) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", > line 84, in hbase_decommission > logoutput=True > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 166, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", > line 262, in action_run > tries=self.resource.tries, try_sleep=self.resource.try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 72, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 102, in checked_call > tries=tries, try_sleep=try_sleep, > timeout_kill_strategy=timeout_kill_strategy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 150, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 303, in _call > raise ExecutionFailed(err_msg, code, out, err) > resource_management.core.exceptions.ExecutionFailed: Execution of > '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab > hbase/mast...@example.com; /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1' returned 1. Error: Could not find or load main class > org.jruby.Main{code} > stdout: > {code:java} > 2018-02-06 07:25:03,453 - Stack Feature Version Info: Cluster Stack=2.6, > Cluster Current Version=2.6.2.0-205, Command Stack=None, Command > Version=2.6.2.0-205 -> 2.6.2.0-205 > 2018-02-06 07:25:03,476 - Using hadoop conf dir: > /usr/hdp/current/hadoop-client/conf > 2018-02-06 07:25:03,484 - checked_call['hostid'] {} > 2018-02-06 07:25:03,490 - checked_call returned (0, '1aacc56c') > 2018-02-06 07:25:03,502 - > File['/usr/hdp/current/hbase-master/bin/draining_servers.rb'] {'content': > StaticFile('draining_servers.rb'), 'mode': 0755} > 2018-02-06 07:25:03,504 - Execute['/usr/bin/kinit -kt > /etc/security/keytabs/hbase.service.keytab hbase/mast...@example.com; > /usr/hdp/current/hbase-master/bin/hbase --config > /usr/hdp/current/hbase-master/conf > -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf > org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add > worker1'] {'logoutput': True, 'user': 'hbase'} > Error: Could not find or load main class org.jruby.Main > Command failed after 1 tries{code} > --
[jira] [Created] (AMBARI-22918) Decommission RegionServer fails when kerberos is enabled
Toshihiro Suzuki created AMBARI-22918: - Summary: Decommission RegionServer fails when kerberos is enabled Key: AMBARI-22918 URL: https://issues.apache.org/jira/browse/AMBARI-22918 Project: Ambari Issue Type: Bug Components: ambari-server Reporter: Toshihiro Suzuki When kerberos is enabled, Decommission RegionServer fails with the following errors: stderr: {code:java} Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", line 114, in HbaseMaster().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", line 55, in decommission hbase_decommission(env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", line 84, in hbase_decommission logoutput=True File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/mast...@example.com; /usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add worker1' returned 1. Error: Could not find or load main class org.jruby.Main{code} stdout: {code:java} 2018-02-06 07:25:03,453 - Stack Feature Version Info: Cluster Stack=2.6, Cluster Current Version=2.6.2.0-205, Command Stack=None, Command Version=2.6.2.0-205 -> 2.6.2.0-205 2018-02-06 07:25:03,476 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2018-02-06 07:25:03,484 - checked_call['hostid'] {} 2018-02-06 07:25:03,490 - checked_call returned (0, '1aacc56c') 2018-02-06 07:25:03,502 - File['/usr/hdp/current/hbase-master/bin/draining_servers.rb'] {'content': StaticFile('draining_servers.rb'), 'mode': 0755} 2018-02-06 07:25:03,504 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/mast...@example.com; /usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add worker1'] {'logoutput': True, 'user': 'hbase'} Error: Could not find or load main class org.jruby.Main Command failed after 1 tries{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)