[jira] [Created] (HBASE-22852) hbase nightlies leaking gpg-agents

2019-08-14 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HBASE-22852:


 Summary: hbase nightlies leaking gpg-agents
 Key: HBASE-22852
 URL: https://issues.apache.org/jira/browse/HBASE-22852
 Project: HBase
  Issue Type: Bug
Reporter: Allen Wittenauer


FYI, just triggered yetus master, which includes code to find and kill 
long-running processes still attached to the Jenkins workspace directory.  It 
came up with this:

https://builds.apache.org/view/S-Z/view/Yetus/job/yetus-github-multibranch/job/master/134/console

{code}
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
 jenkins752  0.0  0.0  93612   584 ?Ss   Aug12   0:00 gpg-agent 
--homedir 
/home/jenkins/jenkins-slave/workspace/HBase_Nightly_HBASE-20952/downloads-hadoop-2/.gpg
 --use-standard-socket --daemon
 Killing 752 ***
{code}

(repeat 10s of times, which slightly different dates, pids, versions, etc)

Also, be aware that any other process running on the node (such as the other 
executor) has extremely easy access to whatever gpg creds you are using...



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (HBASE-22167) Unify the new github based pre commit job and our nightly job

2019-04-11 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815496#comment-16815496
 ] 

Allen Wittenauer edited comment on HBASE-22167 at 4/11/19 3:14 PM:
---

Definitely not supported yet.

Jenkins auth tokens aren't supported by Pipeline jobs.  This means that the 
jenkins-admin code needs to actually have a real user to auth against jenkins 
in order to submit jobs.  The alternative is to have jenkins-admin write 
something that can be read by groovy code sitting in a Jenkins pipeline that 
does a job submission without needing to auth.  If that path is taken, then it 
also needs to have a {project list} ->  {job list} mapping, since there is no 
real 1:1 mapping anymore.  (e.g., HADOOP, HDFS, YARN, ... -> 
hadoop-multibranch-pipeline)

Yet Another Alternative is to try and replace jenkins-admin with the 
jenkins-jira plugin.  It's loaded on our Jenkins server, but my attempts to use 
it in any meaningful way fell apart since it didn't seem to understand 
attachments very well.  But the theory was that a pipeline job could be written 
that would take that plugins input and just resubmit to the appropriate the 
multibranch job.

All-in-all, it's a lot of work.  I had some same code written to implement the 
jenkins-admin-as-a-pipeline-job but I can't seem to find it.  Plus I'm not 
doing much with the ASF anymore so it sort of fell off my priority list.



was (Author: aw):
Definitely not supported yet.

Jenkins auth tokens aren't supported by Pipeline jobs.  This means that the 
jenkins-admin code needs to actually have a real user to auth against jenkins 
in order to submit jobs.  The alternative is to have jenkins-admin write 
something that can be read by groovy code sitting in a Jenkins pipeline that 
does a job submission without needing to auth.  If that path is taken, then it 
also needs to have a {project list} ->  {job list} mapping, since there is no 
real 1:1 mapping anymore.  (e.g., HADOOP, HDFS, YARN, ... -> 
hadoop-multibranch-pipeline)

Yet Another Alternative is to try and replace jenkins-admin with the 
jenkins-jira plugin.  It's loaded on our Jenkins server, but my attempts to use 
it in any meaningful way fell apart since it didn't seem to understand 
attachments very well.  But the theory was that a pipeline job could be written 
that would take that plugins input and just resubmit to the appropriate the 
multibranch job.

All-in-all, it's a lot of work.  I had some same code written to implement the 
latter but I can't seem to find it.  Plus I'm not doing much with the ASF 
anymore so it sort of fell off my priority list.


> Unify the new github based pre commit job and our nightly job
> -
>
> Key: HBASE-22167
> URL: https://issues.apache.org/jira/browse/HBASE-22167
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Priority: Minor
>
> Now we use two jenkins files and set up two jobs on jenkins. They both use 
> yetus and seems yetus 0.9.0 can have a PR tab and a branch tab in the same 
> job. So we can unify them together.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22167) Unify the new github based pre commit job and our nightly job

2019-04-11 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815496#comment-16815496
 ] 

Allen Wittenauer commented on HBASE-22167:
--

Definitely not supported yet.

Jenkins auth tokens aren't supported by Pipeline jobs.  This means that the 
jenkins-admin code needs to actually have a real user to auth against jenkins 
in order to submit jobs.  The alternative is to have jenkins-admin write 
something that can be read by groovy code sitting in a Jenkins pipeline that 
does a job submission without needing to auth.  If that path is taken, then it 
also needs to have a {project list} ->  {job list} mapping, since there is no 
real 1:1 mapping anymore.  (e.g., HADOOP, HDFS, YARN, ... -> 
hadoop-multibranch-pipeline)

Yet Another Alternative is to try and replace jenkins-admin with the 
jenkins-jira plugin.  It's loaded on our Jenkins server, but my attempts to use 
it in any meaningful way fell apart since it didn't seem to understand 
attachments very well.  But the theory was that a pipeline job could be written 
that would take that plugins input and just resubmit to the appropriate the 
multibranch job.

All-in-all, it's a lot of work.  I had some same code written to implement the 
latter but I can't seem to find it.  Plus I'm not doing much with the ASF 
anymore so it sort of fell off my priority list.


> Unify the new github based pre commit job and our nightly job
> -
>
> Key: HBASE-22167
> URL: https://issues.apache.org/jira/browse/HBASE-22167
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Priority: Minor
>
> Now we use two jenkins files and set up two jobs on jenkins. They both use 
> yetus and seems yetus 0.9.0 can have a PR tab and a branch tab in the same 
> job. So we can unify them together.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21955) Auto insert release changes and releasenotes in release scripts

2019-03-20 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797836#comment-16797836
 ] 

Allen Wittenauer commented on HBASE-21955:
--

btw, yetus 0.9.0 has a maven plugin that can run releasedocmaker.  no need to 
download yetus, etc, if you go that approach.

> Auto insert release changes and releasenotes in release scripts
> ---
>
> Key: HBASE-21955
> URL: https://issues.apache.org/jira/browse/HBASE-21955
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Priority: Major
> Attachments: rns.sh
>
>
> Should be able to script updating changes and releasenotes as part of 
> create-releases/release-build.sh.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21432) [hbase-connectors] Add Apache Yetus integration for hbase-connectors repository

2018-11-23 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16697372#comment-16697372
 ] 

Allen Wittenauer commented on HBASE-21432:
--

I've setup a job on the ASF Jenkins pointing to my tree and using Github Branch 
Source plugin: 

https://builds.apache.org/view/S-Z/view/Yetus/job/yetus-buretoolbox-demo/

(Normally all branches would be listed, but I've got it configured to only work 
with PRs to cut down the amount of extra output in the tabs and because I don't 
want to trigger a storm of jobs.)

By far, the biggest gotcha when working with this plugin is that jobs 
absolutely must delete their workspace on exit.  Otherwise slaves fill up fast. 
 (and I suspect this is one of the key problems with the ASF Jenkins infra and 
space... it isn't particularly obvious that each branch, pr, etc, gets it's 
*own* workspace dir)

> [hbase-connectors] Add Apache Yetus integration for hbase-connectors 
> repository 
> 
>
> Key: HBASE-21432
> URL: https://issues.apache.org/jira/browse/HBASE-21432
> Project: HBase
>  Issue Type: Task
>  Components: build, hbase-connectors
>Affects Versions: connector-1.0.0
>Reporter: Peter Somogyi
>Priority: Major
>
> Add automated testing for pull requests and patch files created for 
> hbase-connectors repository. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21432) [hbase-connectors] Add Apache Yetus integration for hbase-connectors repository

2018-11-23 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696270#comment-16696270
 ] 

Allen Wittenauer edited comment on HBASE-21432 at 11/23/18 5:13 PM:


The Github Pull Request Builder has been mostly replaced by the Github Branch 
Source Plugin (which is what I meant above).  The big gotcha is that gprb is 
for freestyle jobs, gbsp is for pipeline jobs.

Using Jenkins w/a Multibranch Pipeline + Github Branch Source Plugin, it 
enables a tab-based system that allows one to re-run branches, PR, and, if 
enabled, tags on demand from the Jenkins UI.  The Yetus integration that is 
done as part of YETUS-681 (and committed into my dev branch) just makes 
test-patch smarter to know it is running under Jenkins, where the PR is at, 
etc., so that there isn't a need to manually configure or pass parameters for 
details that Jenkins itself shares via environment variables.  This means that 
one stanza in the Pipeline can do both full builds and incremental/PR builds 
with no work required in the Pipeline to figure out what type of run it is.


was (Author: aw):
The Github Pull Request Builder has been mostly replaced by the Github Branch 
Source Plugin (which is what I meant above).  The big gotcha is that gprb is 
for freestyle jobs, gbsp is for pipeline jobs.

Using Jenkins w/a Multibranch Pipeline + Github Branch Source Plugin, it 
enables a tab-based system that allows one to re-run branches, PR, and, if 
enabled, tags on demand from the Jenkins UI.  The Yetus integration that is 
done as part of YETUS-708 (and committed into my dev branch) just makes 
test-patch smarter to know it is running under Jenkins, where the PR is at, 
etc., so that there isn't a need to manually configure or pass parameters for 
details that Jenkins itself shares via environment variables.  This means that 
one stanza in the Pipeline can do both full builds and incremental/PR builds 
with no work required in the Pipeline to figure out what type of run it is.

> [hbase-connectors] Add Apache Yetus integration for hbase-connectors 
> repository 
> 
>
> Key: HBASE-21432
> URL: https://issues.apache.org/jira/browse/HBASE-21432
> Project: HBase
>  Issue Type: Task
>  Components: build, hbase-connectors
>Affects Versions: connector-1.0.0
>Reporter: Peter Somogyi
>Priority: Major
>
> Add automated testing for pull requests and patch files created for 
> hbase-connectors repository. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21432) [hbase-connectors] Add Apache Yetus integration for hbase-connectors repository

2018-11-22 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696270#comment-16696270
 ] 

Allen Wittenauer commented on HBASE-21432:
--

The Github Pull Request Builder has been mostly replaced by the Github Branch 
Source Plugin (which is what I meant above).  The big gotcha is that gprb is 
for freestyle jobs, gbsp is for pipeline jobs.

Using Jenkins w/a Multibranch Pipeline + Github Branch Source Plugin, it 
enables a tab-based system that allows one to re-run branches, PR, and, if 
enabled, tags on demand from the Jenkins UI.  The Yetus integration that is 
done as part of YETUS-708 (and committed into my dev branch) just makes 
test-patch smarter to know it is running under Jenkins, where the PR is at, 
etc., so that there isn't a need to manually configure or pass parameters for 
details that Jenkins itself shares via environment variables.  This means that 
one stanza in the Pipeline can do both full builds and incremental/PR builds 
with no work required in the Pipeline to figure out what type of run it is.

> [hbase-connectors] Add Apache Yetus integration for hbase-connectors 
> repository 
> 
>
> Key: HBASE-21432
> URL: https://issues.apache.org/jira/browse/HBASE-21432
> Project: HBase
>  Issue Type: Task
>  Components: build, hbase-connectors
>Affects Versions: connector-1.0.0
>Reporter: Peter Somogyi
>Priority: Major
>
> Add automated testing for pull requests and patch files created for 
> hbase-connectors repository. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21432) [hbase-connectors] Add Apache Yetus integration for hbase-connectors repository

2018-11-19 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16692743#comment-16692743
 ] 

Allen Wittenauer commented on HBASE-21432:
--

FWIW, I've got native support for Jenkins' Github Source Plugin ready to roll, 
just waiting for a series of patch review (since it is part of a chain).  With 
it, test-patch can use webhooks from github--and therefore no poll script 
required.   If you want to play with it, it's all sitting in either my github 
or gitlab repos.

> [hbase-connectors] Add Apache Yetus integration for hbase-connectors 
> repository 
> 
>
> Key: HBASE-21432
> URL: https://issues.apache.org/jira/browse/HBASE-21432
> Project: HBase
>  Issue Type: Task
>  Components: build, hbase-connectors
>Affects Versions: connector-1.0.0
>Reporter: Peter Somogyi
>Priority: Major
>
> Add automated testing for pull requests and patch files created for 
> hbase-connectors repository. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-14163) hbase master stop loops both processes forever

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HBASE-14163.
--
Resolution: Won't Fix

> hbase master stop loops both processes forever
> --
>
> Key: HBASE-14163
> URL: https://issues.apache.org/jira/browse/HBASE-14163
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>Assignee: Andrew Purtell
>Priority: Major
>
> It would appear that there is an infinite loop in the zk client connection 
> code when performing a master stop when no external zk servers are configured.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20971) Please add OWASP Dependency Check to the core build (pom.xml) and all sub-component builds.

2018-08-04 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569357#comment-16569357
 ] 

Allen Wittenauer commented on HBASE-20971:
--

FWIW, the Apache Yetus team is adding support for the OWASP dependency checker 
to precommit and qbt.  See YETUS-441 for details.


> Please add OWASP Dependency Check to the core build (pom.xml) and all 
> sub-component builds.
> ---
>
> Key: HBASE-20971
> URL: https://issues.apache.org/jira/browse/HBASE-20971
> Project: HBase
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 3.0.0, 2.2.0, 2.1.1
> Environment: All development, build, test, environments.
>Reporter: Albert Baker
>Priority: Major
>  Labels: build, easy-fix, security
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Please add OWASP Dependency Check to the build (pom.xml). OWASP DC makes an 
> outbound REST call to MITRE Common Vulnerabilities & Exposures (CVE) to 
> perform a lookup for each dependant .jar to list any/all known 
> vulnerabilities for each jar. This step is needed because a manual MITRE CVE 
> lookup/check on the main component does not include checking for 
> vulnerabilities in components or in dependant libraries.
> OWASP Dependency check : 
> https://www.owasp.org/index.php/OWASP_Dependency_Check has plug-ins for most 
> Java build/make types (ant, maven, ivy, gradle).
> Also, add the appropriate command to the nightly build to generate a report 
> of all known vulnerabilities in any/all third party libraries/dependencies 
> that get pulled in. example : mvn -Powasp -Dtest=false -DfailIfNoTests=false 
> clean aggregate
> Generating this report nightly/weekly will help inform the project's 
> development team if any dependant libraries have a reported known 
> vulnerailities. Project teams that keep up with removing vulnerabilities on a 
> weekly basis will help protect businesses that rely on these open source 
> componets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19902) Current Jenkins Madness: OOME, can't start minihbasecluster, etc.

2018-02-01 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349025#comment-16349025
 ] 

Allen Wittenauer commented on HBASE-19902:
--

https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/11330/console
 is  INFRA-15920 . 

It looks like test-patch averages a bit over 5k when doing certain modules, 
with the 6020 being a bit of an outlier.  But as you said, 6020 might even be 
low under the particular conditions.

I can't think of any other parameters that might be useful here.  Filed 
YETUS-612 to keep track of memory in the docker container.  It'd be nice to 
know how much mem and IO is being used.

> Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
> -
>
> Key: HBASE-19902
> URL: https://issues.apache.org/jira/browse/HBASE-19902
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-19902.temporary-2.001.patch
>
>
> Trying to figure what is going on w/ jenkins build
> Changed the hadoopqa config to output long process listing rather than just 
> 'java'... 
> I can't get loadavg... tried dumping /proc...
>  /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied
> Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console, 
> see 7 java processes running on H2. Extra args on ps may help here whether it 
> zombies of us.
> Test run was find then fell into hbase-server second part and soon after 
> started failing..
> https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt
> Looking at first test failure... this is where main thread is, trying to get 
> thread info:
> {code}
> Thread 23 (Time-limited test):
>   State: RUNNABLE
>   Blocked count: 118
>   Waited count: 58
>   Stack:
> sun.management.ThreadImpl.getThreadInfo1(Native Method)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
> 
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:498)
> 
> org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
> org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
> 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
> 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
> 
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> {code}
> Master is not coming up
> {code}
> 2018-01-31 02:22:31,474 ERROR [Time-limited test] 
> hbase.MiniHBaseCluster(267): Error starting cluster
> java.lang.RuntimeException: Master not active after 3ms
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:192)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
>   at 
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
>

[jira] [Commented] (HBASE-19902) Current Jenkins Madness: OOME, can't start minihbasecluster, etc.

2018-02-01 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348987#comment-16348987
 ] 

Allen Wittenauer commented on HBASE-19902:
--

Youch. 11314 went well over 5k:

| Max. process+thread count | 6020 (vs. ulimit of 1) |



> Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
> -
>
> Key: HBASE-19902
> URL: https://issues.apache.org/jira/browse/HBASE-19902
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-19902.temporary-2.001.patch
>
>
> Trying to figure what is going on w/ jenkins build
> Changed the hadoopqa config to output long process listing rather than just 
> 'java'... 
> I can't get loadavg... tried dumping /proc...
>  /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied
> Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console, 
> see 7 java processes running on H2. Extra args on ps may help here whether it 
> zombies of us.
> Test run was find then fell into hbase-server second part and soon after 
> started failing..
> https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt
> Looking at first test failure... this is where main thread is, trying to get 
> thread info:
> {code}
> Thread 23 (Time-limited test):
>   State: RUNNABLE
>   Blocked count: 118
>   Waited count: 58
>   Stack:
> sun.management.ThreadImpl.getThreadInfo1(Native Method)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
> 
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:498)
> 
> org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
> org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
> 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
> 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
> 
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> {code}
> Master is not coming up
> {code}
> 2018-01-31 02:22:31,474 ERROR [Time-limited test] 
> hbase.MiniHBaseCluster(267): Error starting cluster
> java.lang.RuntimeException: Master not active after 3ms
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:192)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
>   at 
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(F

[jira] [Comment Edited] (HBASE-19902) Current Jenkins Madness: OOME, can't start minihbasecluster, etc.

2018-01-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348032#comment-16348032
 ] 

Allen Wittenauer edited comment on HBASE-19902 at 2/1/18 5:28 AM:
--

Awesome work! Thanks [~stack]. 

I spent some time looking over the output of various jobs.  At this point, I'm 
not entirely convinced that hbase is hitting the proc limit [*]. I'm more 
inclined to think that it's actually hitting the Docker memory. By chance, did 
anyone up the --dockermemlimit setting?  If not, try --dockermemlimit=20g .  
That should be less than half of the node's RAM.

EDIT:
* - at least, at anything past the 5k mark.  


was (Author: aw):
Awesome work! Thanks [~stack]. 

I spent some time looking over the output of various jobs.  At this point, I'm 
not entirely convinced that hbase is hitting the proc limit. I'm more inclined 
to think that it's actually hitting the Docker memory. By chance, did anyone up 
the --dockermemlimit setting?  If not, try --dockermemlimit=20g .  That should 
be less than half of the node's RAM.

> Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
> -
>
> Key: HBASE-19902
> URL: https://issues.apache.org/jira/browse/HBASE-19902
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-19902.temporary-2.001.patch
>
>
> Trying to figure what is going on w/ jenkins build
> Changed the hadoopqa config to output long process listing rather than just 
> 'java'... 
> I can't get loadavg... tried dumping /proc...
>  /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied
> Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console, 
> see 7 java processes running on H2. Extra args on ps may help here whether it 
> zombies of us.
> Test run was find then fell into hbase-server second part and soon after 
> started failing..
> https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt
> Looking at first test failure... this is where main thread is, trying to get 
> thread info:
> {code}
> Thread 23 (Time-limited test):
>   State: RUNNABLE
>   Blocked count: 118
>   Waited count: 58
>   Stack:
> sun.management.ThreadImpl.getThreadInfo1(Native Method)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
> 
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:498)
> 
> org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
> org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
> 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
> 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
> 
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> {code}
> Master is not coming up
> {code}
> 2018-01-31 02:22:31,474 ERROR [Time-limited test] 
> hbase.MiniHBaseCluster(267): Error starting cluster
> java.lang.RuntimeException: Master not active after 3ms
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:192)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
>   at 

[jira] [Commented] (HBASE-19902) Current Jenkins Madness: OOME, can't start minihbasecluster, etc.

2018-01-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348032#comment-16348032
 ] 

Allen Wittenauer commented on HBASE-19902:
--

Awesome work! Thanks [~stack]. 

I spent some time looking over the output of various jobs.  At this point, I'm 
not entirely convinced that hbase is hitting the proc limit. I'm more inclined 
to think that it's actually hitting the Docker memory. By chance, did anyone up 
the --dockermemlimit setting?  If not, try --dockermemlimit=20g .  That should 
be less than half of the node's RAM.

> Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
> -
>
> Key: HBASE-19902
> URL: https://issues.apache.org/jira/browse/HBASE-19902
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-19902.temporary-2.001.patch
>
>
> Trying to figure what is going on w/ jenkins build
> Changed the hadoopqa config to output long process listing rather than just 
> 'java'... 
> I can't get loadavg... tried dumping /proc...
>  /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied
> Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console, 
> see 7 java processes running on H2. Extra args on ps may help here whether it 
> zombies of us.
> Test run was find then fell into hbase-server second part and soon after 
> started failing..
> https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt
> Looking at first test failure... this is where main thread is, trying to get 
> thread info:
> {code}
> Thread 23 (Time-limited test):
>   State: RUNNABLE
>   Blocked count: 118
>   Waited count: 58
>   Stack:
> sun.management.ThreadImpl.getThreadInfo1(Native Method)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
> 
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:498)
> 
> org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
> org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
> 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
> 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
> 
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> {code}
> Master is not coming up
> {code}
> 2018-01-31 02:22:31,474 ERROR [Time-limited test] 
> hbase.MiniHBaseCluster(267): Error starting cluster
> java.lang.RuntimeException: Master not active after 3ms
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:192)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
>   at 
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.Nati

[jira] [Comment Edited] (HBASE-19902) Current Jenkins Madness: OOME, can't start minihbasecluster, etc.

2018-01-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347336#comment-16347336
 ] 

Allen Wittenauer edited comment on HBASE-19902 at 1/31/18 6:30 PM:
---

bq. proclimit of 20k instead of 5k (looks like machines allow for 60k fds).

The proclimit option sets ulimit -u, the maximum number of processes allowed.  
There is no correlation with fds.  [Yetus does not set that ulimit value.]

The process limit is exceedingly tricky.  There is the actual value set by 
ulimit -u and friends.  Then there are cgroup settings enforced by systemd.  
The cgroup limit (set by the UserTasksMax in systemd settings) is the ultimate 
authority.  It also counts across the entire node, not by process group or 
session or any of the other normal boundaries.  The default limit ends up being 
a bit over 12k on the build nodes.

To make matters worse, Java native threads (on Linux, at least) count against 
this limit.  Running 'ps -L -u jenkins -o lwp' will give an approximate idea of 
how many processes are in play at any given time.  [The number reported by 
Yetus when in Docker mode is this number but only present in the container.] 

In the end, this means that all threads/processes consumed by BOTH executors 
and the jenkins slave process must be less than ~13k. 


was (Author: aw):
bq. proclimit of 20k instead of 5k (looks like machines allow for 60k fds).

The proclimit option sets ulimit -u, the maximum number of processes allowed.  
There is no correlation with fds.  [Yetus does not set that ulimit value.]

The process limit is exceedingly tricky.  There is the actual value set by 
ulimit -u and friends.  Then there are cgroup settings enforced by systemd.  
The cgroup limit (set by the UserTasksMax in systemd settings) is the ultimate 
authority.  It also counts across the entire node, not by process group or 
session or any of the other normal boundaries.  The default limit ends up being 
a bit over 12k on the build nodes.

To make matters worse, Java native threads count against this limit.  Running 
'ps -L -u jenkins -o lwp' will give an approximate idea of how many processes 
are in play at any given time.  [The number reported by Yetus when in Docker 
mode is this number but only present in the container.] 

In the end, this means that all threads/processes consumed by BOTH executors 
and the jenkins slave process must be less than ~13k. 

> Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
> -
>
> Key: HBASE-19902
> URL: https://issues.apache.org/jira/browse/HBASE-19902
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-19902.temporary-2.001.patch
>
>
> Trying to figure what is going on w/ jenkins build
> Changed the hadoopqa config to output long process listing rather than just 
> 'java'... 
> I can't get loadavg... tried dumping /proc...
>  /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied
> Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console, 
> see 7 java processes running on H2. Extra args on ps may help here whether it 
> zombies of us.
> Test run was find then fell into hbase-server second part and soon after 
> started failing..
> https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt
> Looking at first test failure... this is where main thread is, trying to get 
> thread info:
> {code}
> Thread 23 (Time-limited test):
>   State: RUNNABLE
>   Blocked count: 118
>   Waited count: 58
>   Stack:
> sun.management.ThreadImpl.getThreadInfo1(Native Method)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
> 
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:498)
> 
> org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
> org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
> 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
> 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
> 
> org.apache.hadoop.hbase.HBaseTestingUtili

[jira] [Commented] (HBASE-19902) Current Jenkins Madness: OOME, can't start minihbasecluster, etc.

2018-01-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347336#comment-16347336
 ] 

Allen Wittenauer commented on HBASE-19902:
--

bq. proclimit of 20k instead of 5k (looks like machines allow for 60k fds).

The proclimit option sets ulimit -u, the maximum number of processes allowed.  
There is no correlation with fds.  [Yetus does not set that ulimit value.]

The process limit is exceedingly tricky.  There is the actual value set by 
ulimit -u and friends.  Then there are cgroup settings enforced by systemd.  
The cgroup limit (set by the UserTasksMax in systemd settings) is the ultimate 
authority.  It also counts across the entire node, not by process group or 
session or any of the other normal boundaries.  The default limit ends up being 
a bit over 12k on the build nodes.

To make matters worse, Java native threads count against this limit.  Running 
'ps -L -u jenkins -o lwp' will give an approximate idea of how many processes 
are in play at any given time.  [The number reported by Yetus when in Docker 
mode is this number but only present in the container.] 

In the end, this means that all threads/processes consumed by BOTH executors 
and the jenkins slave process must be less than ~13k. 

> Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
> -
>
> Key: HBASE-19902
> URL: https://issues.apache.org/jira/browse/HBASE-19902
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-19902.temporary-2.001.patch
>
>
> Trying to figure what is going on w/ jenkins build
> Changed the hadoopqa config to output long process listing rather than just 
> 'java'... 
> I can't get loadavg... tried dumping /proc...
>  /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied
> Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console, 
> see 7 java processes running on H2. Extra args on ps may help here whether it 
> zombies of us.
> Test run was find then fell into hbase-server second part and soon after 
> started failing..
> https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt
> Looking at first test failure... this is where main thread is, trying to get 
> thread info:
> {code}
> Thread 23 (Time-limited test):
>   State: RUNNABLE
>   Blocked count: 118
>   Waited count: 58
>   Stack:
> sun.management.ThreadImpl.getThreadInfo1(Native Method)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
> 
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:498)
> 
> org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
> org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
> 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
> 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
> 
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> {code}
> Master is not coming up
> {code}
> 2018-01-31 02:22:31,474 ERROR [Time-limited test] 
> hbase.MiniHBaseCluster(267): Error starting cluster
> java.lang.RuntimeException: Master not active after 3ms
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:192)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
>   at 
> org.apache.hadoop.hbase.HBaseTes

[jira] [Commented] (HBASE-19902) Current Jenkins Madness: OOME, can't start minihbasecluster, etc.

2018-01-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347195#comment-16347195
 ] 

Allen Wittenauer commented on HBASE-19902:
--

Copying my comment from HBASE-19887:

===

Chances are that the unit tests are going over the 5k mark. The number in the 
output is what was measured as successfully launched in a given interval. It 
does not measure how many threads were attempted. One way to further test this 
is to set proclimit to something higher (like 10k) and running on H30 which has 
a higher UserTasksMax configured.

===

Two other things:

* be aware of parallelism.  If parallelism is set to five, two tests are 
running, and three new tests try to launch at the same time, but each needs 
900, the run will blow up but the number reported will be low.

* One of the outcomes of HDFS-12711 was finding out that surefire will not 
always report test failures under certain circumstances such as if surefire 
itself starts to OOM.  In other words, if surefire fails to launch a test, it 
may not record ANY result for it.  This means tests may have been failing 
before but were never reported as neither success nor fail.  They just never 
existed as far as the harness is concerned.  Now, these tests are getting 
reported because the lower limit means troubled tests fail quicker, freeing up 
more resources for surefire to keep pounding away.  See also SUREFIRE-1447.

> Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
> -
>
> Key: HBASE-19902
> URL: https://issues.apache.org/jira/browse/HBASE-19902
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-19902.temporary-2.001.patch
>
>
> Trying to figure what is going on w/ jenkins build
> Changed the hadoopqa config to output long process listing rather than just 
> 'java'... 
> I can't get loadavg... tried dumping /proc...
>  /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied
> Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console, 
> see 7 java processes running on H2. Extra args on ps may help here whether it 
> zombies of us.
> Test run was find then fell into hbase-server second part and soon after 
> started failing..
> https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt
> Looking at first test failure... this is where main thread is, trying to get 
> thread info:
> {code}
> Thread 23 (Time-limited test):
>   State: RUNNABLE
>   Blocked count: 118
>   Waited count: 58
>   Stack:
> sun.management.ThreadImpl.getThreadInfo1(Native Method)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
> 
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:498)
> 
> org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
> org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
> 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
> 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
> 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
> 
> org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> {code}
> Master is not coming up
> {code}
> 2018-01-31 02:22:31,474 ERROR [Time-limited test] 
> hbase.MiniHBaseCluster(267): Error starting cluster
> java.lang.RuntimeException: Master not active after 3ms
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:192)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
>   at 
> org.apache.hadoop.hbase.MiniHBaseClust

[jira] [Commented] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules

2018-01-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347061#comment-16347061
 ] 

Allen Wittenauer commented on HBASE-19887:
--

Chances are that the unit tests are going over the 5k mark.  The number in the 
output is what was measured as successfully launched in a given interval. It 
does not measure how many threads were attempted. One way to further test this 
is to set proclimit to something higher (like 10k) and running on H30 which has 
a higher UserTasksMax configured.

> Do not overwrite the surefire junit listener property in the pom of sub 
> modules
> ---
>
> Key: HBASE-19887
> URL: https://issues.apache.org/jira/browse/HBASE-19887
> Project: HBase
>  Issue Type: Sub-task
>  Components: build
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19887-v1.patch, HBASE-19887-v1.patch, 
> HBASE-19887-v1.patch, HBASE-19887-v1.patch, HBASE-19887-v1.patch, 
> HBASE-19887-v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules

2018-01-30 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345775#comment-16345775
 ] 

Allen Wittenauer commented on HBASE-19887:
--

BTW, 0.7.0 now really does abort when Jenkins sends a kill signal to Yetus 
docker containers. :)

> Do not overwrite the surefire junit listener property in the pom of sub 
> modules
> ---
>
> Key: HBASE-19887
> URL: https://issues.apache.org/jira/browse/HBASE-19887
> Project: HBase
>  Issue Type: Sub-task
>  Components: build
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-19887-v1.patch, HBASE-19887-v1.patch, 
> HBASE-19887.patch, HBASE-19887.patch, HBASE-19887.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19887) Do not overwrite the surefire junit listener property in the pom of sub modules

2018-01-30 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345695#comment-16345695
 ] 

Allen Wittenauer commented on HBASE-19887:
--


Jobs is hitting the new process limit code in 0.7.0:

|Max. process+thread count|923 (vs. ulimit of 1000)|

See 
https://issues.apache.org/jira/browse/HBASE-19898?focusedCommentId=16345664&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16345664
 


> Do not overwrite the surefire junit listener property in the pom of sub 
> modules
> ---
>
> Key: HBASE-19887
> URL: https://issues.apache.org/jira/browse/HBASE-19887
> Project: HBase
>  Issue Type: Sub-task
>  Components: build
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-19887-v1.patch, HBASE-19887.patch, 
> HBASE-19887.patch, HBASE-19887.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19898) Canary should choose RegionStdOutSink automatically when write sniffing is specified

2018-01-30 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345664#comment-16345664
 ] 

Allen Wittenauer commented on HBASE-19898:
--

The mvn command never run.  It looks like the job is hitting either the Yetus 
resource limits or the global resource limits:

https://builds.apache.org/job/PreCommit-HBASE-Build/11260/artifact/patchprocess/coprocessors.txt

Add something like --proclimit=5000 to the command line, which is slightly less 
than half of the max processes that the ASF Infra team has configured. Any more 
than that and jobs will either randomly fail (best case) or cause the node to 
fail (worst case, see HDFS-12711).  

See INFRA-15685 where I'm trying to get it raised to something more reasonable.

> Canary should choose RegionStdOutSink automatically when write sniffing is 
> specified
> 
>
> Key: HBASE-19898
> URL: https://issues.apache.org/jira/browse/HBASE-19898
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 19898.v1.txt
>
>
> Currently RegionServerStdOutSink is instantiated by default, even if user 
> specifies -writeSniffing on the command line.
> Write sniffing would be ignored since Sink instance is of 
> RegionServerStdOutSink class:
> {code}
> if (this.sink instanceof RegionServerStdOutSink || this.regionServerMode) 
> {
>   monitor =
>   new RegionServerMonitor(connection, monitorTargets, this.useRegExp,
>   (StdOutSink) this.sink, this.executor, 
> this.regionServerAllRegions,
>   this.treatFailureAsError);
> {code}
> RegionStdOutSink should be used for write sniffing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17285) Misconfiguration of JVM GC options in HADOOP_CLIENT_OPTS may break `bin/hbase`

2016-12-11 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740383#comment-15740383
 ] 

Allen Wittenauer commented on HBASE-17285:
--

Unfortunately, all of the _OPTS handling in most of the Hadoop ecosystem 
scripts I've looked at do very bad things and are pretty much dependent upon 
using space delimiters.  This means no, folks can't properly quote it in 
scripts and there are some limitations on these values.  This obviously causes 
other problems (the biggest one probably being the inability to use directory 
paths with spaces) which is why shellcheck is throwing a fit.

The only real solution I've found is to convert them all to arrays.  This can 
be done in a somewhat backward compatible change, but it's massive amount of 
work, even for the rewritten scripts.  See HADOOP-13365 for what I've started 
doing in Hadoop.

> Misconfiguration of JVM GC options in HADOOP_CLIENT_OPTS may break `bin/hbase`
> --
>
> Key: HBASE-17285
> URL: https://issues.apache.org/jira/browse/HBASE-17285
> Project: HBase
>  Issue Type: Bug
>  Components: scripts
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17285.001.patch
>
>
> Had the great fun of digging through this one. Had a user reporting that 
> hiveserver2 was no longer finding HBase jars on the classpath. This is 
> supposed to happen via {{hbase mapredcp}}.
> It turned out that they had configured hive-env.sh to set 
> {{HADOOP_CLIENT_OPTS="-XX:+PrintGCDetails"}} (among other things), which 
> creates a big multi-line string instead of just a directory. Because of poor 
> quoting in {{bin/hbase}}, this gives you a wonderfully intuitive error:
> {noformat}
> Error: Could not find or load main class Heap
> {noformat}
> That {{Heap}} is actually from the JVM GC details that it was told to print. 
> While I don't expect this to be a common problem people run into, it's one 
> that we can address with better quoting. e.g.
> {noformat}
> + exec 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/bin/java 
> -Dproc_mapredcp '-XX:OnOutOfMemoryError=kill -9 %p' -XX:+UseConcMarkSweepGC 
> -Dhbase.log.dir=/usr/local/lib/hbase//logs -Dhbase.log.file=hbase.log 
> -Dhbase.home.dir=/usr/local/lib/hbase/ -Dhbase.id.str= 
> -Dhbase.root.logger=INFO,console 
> '-Djava.library.path='\''/usr/local/lib/hadoop//lib/native' Heap PSYoungGen 
> total 76800K, used 7942K '[0x0007f550,' 0x0007faa8, 
> '0x0008)' eden space 66048K, 12% used 
> '[0x0007f550,0x0007f5cc19c0,0x0007f958)' from space 
> 10752K, 0% used '[0x0007fa00,0x0007fa00,0x0007faa8)' 
> to space 10752K, 0% used 
> '[0x0007f958,0x0007f958,0x0007fa00)' ParOldGen total 
> 174592K, used 0K '[0x0007e000,' 0x0007eaa8, 
> '0x0007f550)' object space 174592K, 0% used 
> '[0x0007e000,0x0007e000,0x0007eaa8)' PSPermGen total 
> 21504K, used 2756K '[0x0007dae0,' 0x0007dc30, 
> '0x0007e000)' object space 21504K, 12% used 
> '[0x0007dae0,0x0007db0b11b8,0x0007dc30)'\''' 
> -Dhbase.security.logger=INFO,NullAppender 
> org.apache.hadoop.hbase.util.MapreduceDependencyClasspathTool
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13525) Update test-patch to leverage Apache Yetus

2016-01-07 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088511#comment-15088511
 ] 

Allen Wittenauer commented on HBASE-13525:
--

FWIW, my current plan for test-patch, etc, in hadoop is in HADOOP-12651.  It 
basically replaces them with wrappers that do downloads, etc.

> Update test-patch to leverage Apache Yetus
> --
>
> Key: HBASE-13525
> URL: https://issues.apache.org/jira/browse/HBASE-13525
> Project: HBase
>  Issue Type: Improvement
>  Components: build
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: jenkins
> Fix For: 2.0.0
>
> Attachments: HBASE-13525.1.patch
>
>
> Once HADOOP-11746 lands over in Hadoop, incorporate its changes into our 
> test-patch. Most likely easiest approach is to start with the Hadoop version 
> and add in the features we have locally that they don't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14175) Adopt releasedocmaker for better generated release notes

2015-07-30 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648539#comment-14648539
 ] 

Allen Wittenauer commented on HBASE-14175:
--

OK, docs are now viewable online here: 
https://github.com/apache/hadoop/blob/HADOOP-12111/dev-support/docs/releasedocmaker.md
 :D

> Adopt releasedocmaker for better generated release notes
> 
>
> Key: HBASE-14175
> URL: https://issues.apache.org/jira/browse/HBASE-14175
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Purtell
> Fix For: 2.0.0
>
>
> We should consider adopting Hadoop's releasedocmaker for better release 
> notes. This would pull out text from the JIRA 'release notes' field with 
> clean presentation and is vastly superior to our current notes, which are 
> simply JIRA's list of issues by fix version. Could hook it into the site 
> build. A convenient part of Yetus to get up and running with. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14175) Adopt releasedocmaker for better generated release notes

2015-07-30 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648524#comment-14648524
 ] 

Allen Wittenauer commented on HBASE-14175:
--

BTW, documentation is currently sitting in HADOOP-12228 .  If someone from 
Yetus could +1 it, I'll commit it *hint hint*

bq. so not sure 0.98.0 will work - will be fun to test.

I did a run a while back. ( 
https://github.com/aw-altiscale/eco-release-metadata/tree/master/HBASE  )  It 
works as well as expected.  A few people that have played with the output have 
taken the opportunity to clean things up since it tends to highlight things 
like bogus release notes. The lint mode tries to help with some of those 
things, but it's tuned pretty closely to Hadoop's needs.  

I suspect HBase is going to be better shape due to building release notes from 
JIRA anyway.

> Adopt releasedocmaker for better generated release notes
> 
>
> Key: HBASE-14175
> URL: https://issues.apache.org/jira/browse/HBASE-14175
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Purtell
> Fix For: 2.0.0
>
>
> We should consider adopting Hadoop's releasedocmaker for better release 
> notes. This would pull out text from the JIRA 'release notes' field with 
> clean presentation and is vastly superior to our current notes, which are 
> simply JIRA's list of issues by fix version. Could hook it into the site 
> build. A convenient part of Yetus to get up and running with. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14163) hbase master stop loops both processes forever

2015-07-30 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647987#comment-14647987
 ] 

Allen Wittenauer commented on HBASE-14163:
--

I wonder if this is a race condition.

> hbase master stop loops both processes forever
> --
>
> Key: HBASE-14163
> URL: https://issues.apache.org/jira/browse/HBASE-14163
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> It would appear that there is an infinite loop in the zk client connection 
> code when performing a master stop when no external zk servers are configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14163) hbase master stop loops both processes forever

2015-07-29 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646330#comment-14646330
 ] 

Allen Wittenauer commented on HBASE-14163:
--

So, I just set -Djava.net.preferIPv4Stack=true for HBASE_OPTS in hbase-env.sh 
and still see the same behavior, minus trying to use IPv6.

This is on Mac OS X 10.9.5 with JDK 1.7.0_67.

> hbase master stop loops both processes forever
> --
>
> Key: HBASE-14163
> URL: https://issues.apache.org/jira/browse/HBASE-14163
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> It would appear that there is an infinite loop in the zk client connection 
> code when performing a master stop when no external zk servers are configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14163) hbase master stop loops both processes forever

2015-07-29 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646153#comment-14646153
 ] 

Allen Wittenauer commented on HBASE-14163:
--

How long did it take for your hbase master to shutdown?

> hbase master stop loops both processes forever
> --
>
> Key: HBASE-14163
> URL: https://issues.apache.org/jira/browse/HBASE-14163
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> It would appear that there is an infinite loop in the zk client connection 
> code when performing a master stop when no external zk servers are configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13231) shell script rewrite

2015-07-28 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645011#comment-14645011
 ] 

Allen Wittenauer commented on HBASE-13231:
--

Linking HBASE-14163 as a blocker.

> shell script rewrite
> 
>
> Key: HBASE-13231
> URL: https://issues.apache.org/jira/browse/HBASE-13231
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
> Attachments: HBASE-13231-donotuse.patch
>
>
> This JIRA is for updating the HBase bash scripts to something remotely 
> modern. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14163) hbase master stop loops both processes forever

2015-07-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HBASE-14163:
-
Component/s: master

> hbase master stop loops both processes forever
> --
>
> Key: HBASE-14163
> URL: https://issues.apache.org/jira/browse/HBASE-14163
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> It would appear that there is an infinite loop in the zk client connection 
> code when performing a master stop when no external zk servers are configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14163) hbase master stop loops both processes forever

2015-07-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HBASE-14163:
-
Description: It would appear that there is an infinite loop in the zk 
client connection code when performing a master stop when no external zk 
servers are configured.  (was: It would appear that there is an infinite loop 
in the zk client connection code when performing a master stop when no external 
zk servers are available.)

> hbase master stop loops both processes forever
> --
>
> Key: HBASE-14163
> URL: https://issues.apache.org/jira/browse/HBASE-14163
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> It would appear that there is an infinite loop in the zk client connection 
> code when performing a master stop when no external zk servers are configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14163) hbase master stop loops both processes forever

2015-07-28 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644985#comment-14644985
 ] 

Allen Wittenauer commented on HBASE-14163:
--

I can reproduce this repeatedly using master.

# untar a fresh install with nothing configured except what ships out of the box
# bin/hbase master start
# let it start up
# in another window, bin/hbase master stop

Both processes are now looping:

{code}
2015-07-28 13:16:11,985 INFO  
[10.248.3.81:53113.activeMasterManager-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Opening socket connection to server 
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown 
error)
2015-07-28 13:16:11,985 WARN  
[10.248.3.81:53113.activeMasterManager-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Session 0x14ed64a3ddd0004 for server null, unexpected 
error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-07-28 13:16:12,603 INFO  
[10.248.3.81:53113.activeMasterManager-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Opening socket connection to server 
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
(unknown error)
2015-07-28 13:16:12,603 WARN  
[10.248.3.81:53113.activeMasterManager-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Session 0x14ed64a3ddd0004 for server null, unexpected 
error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
{code}

... until a kill or ctrl-c is sent. 

> hbase master stop loops both processes forever
> --
>
> Key: HBASE-14163
> URL: https://issues.apache.org/jira/browse/HBASE-14163
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> It would appear that there is an infinite loop in the zk client connection 
> code when performing a master stop when no external zk servers are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14163) hbase master stop loops both processes forever

2015-07-28 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HBASE-14163:


 Summary: hbase master stop loops both processes forever
 Key: HBASE-14163
 URL: https://issues.apache.org/jira/browse/HBASE-14163
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Allen Wittenauer


It would appear that there is an infinite loop in the zk client connection code 
when performing a master stop when no external zk servers are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13525) Update test-patch to leverage rewrite in Hadoop

2015-05-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554973#comment-14554973
 ] 

Allen Wittenauer commented on HBASE-13525:
--

I'll be around off and on over the weekend if you need help.  I know that 
HADOOP-11929 will be helpful here too, but I need to get HADOOP-11933 in first 
since it's less of an invasive change.  I'll likely finish 11929 relatively 
soon though.  *crosses fingers*

> Update test-patch to leverage rewrite in Hadoop
> ---
>
> Key: HBASE-13525
> URL: https://issues.apache.org/jira/browse/HBASE-13525
> Project: HBase
>  Issue Type: Improvement
>  Components: build
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: jenkins
> Fix For: 2.0.0
>
>
> Once HADOOP-11746 lands over in Hadoop, incorporate its changes into our 
> test-patch. Most likely easiest approach is to start with the Hadoop version 
> and add in the features we have locally that they don't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13680) Unhandled exception thrown when "hadoop.rpc.protection" is "privacy" in hdfs and in hbase it is "authentication"

2015-05-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542096#comment-14542096
 ] 

Allen Wittenauer commented on HBASE-13680:
--

Moving this to HBase.  There's nothng Hadoop can do about an HBase stack trace.

> Unhandled exception thrown when "hadoop.rpc.protection" is "privacy" in hdfs 
> and in hbase it is "authentication"
> 
>
> Key: HBASE-13680
> URL: https://issues.apache.org/jira/browse/HBASE-13680
> Project: HBase
>  Issue Type: Bug
>Reporter: Archana T
>Assignee: surendra singh lilhore
>Priority: Minor
>
> Unhandled exception thrown when "hadoop.rpc.protection" is "privacy" in hdfs 
> and in hbase it is "authentication"
> 2015-05-13 22:40:18,772 | FATAL | master:51-196-28-1:21300 | Master server 
> abort: loaded coprocessors are: [org.apache.hadoop.hbase.JMXListener] | 
> org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:2279)
> 2015-05-13 22:40:18,773 | FATAL | master:51-196-28-1:21300 | Unhandled 
> exception. Starting shutdown. | 
> org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:2284)
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:375)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1631)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:500)
>   at



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HBASE-13680) Unhandled exception thrown when "hadoop.rpc.protection" is "privacy" in hdfs and in hbase it is "authentication"

2015-05-13 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved HDFS-8389 to HBASE-13680:


Affects Version/s: (was: 2.4.0)
  Key: HBASE-13680  (was: HDFS-8389)
  Project: HBase  (was: Hadoop HDFS)

> Unhandled exception thrown when "hadoop.rpc.protection" is "privacy" in hdfs 
> and in hbase it is "authentication"
> 
>
> Key: HBASE-13680
> URL: https://issues.apache.org/jira/browse/HBASE-13680
> Project: HBase
>  Issue Type: Bug
>Reporter: Archana T
>Assignee: surendra singh lilhore
>Priority: Minor
>
> Unhandled exception thrown when "hadoop.rpc.protection" is "privacy" in hdfs 
> and in hbase it is "authentication"
> 2015-05-13 22:40:18,772 | FATAL | master:51-196-28-1:21300 | Master server 
> abort: loaded coprocessors are: [org.apache.hadoop.hbase.JMXListener] | 
> org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:2279)
> 2015-05-13 22:40:18,773 | FATAL | master:51-196-28-1:21300 | Unhandled 
> exception. Starting shutdown. | 
> org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:2284)
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:375)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1631)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:500)
>   at



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13525) Update test-patch to leverage rewrite in Hadoop

2015-04-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505782#comment-14505782
 ] 

Allen Wittenauer commented on HBASE-13525:
--

One of my unstated goals with HADOOP-11746 was for it to be a mostly drop-in 
replacement for any project currently using the older bits. (A quick pass 
through most of the ecosystem reveals that the majority are using some form of 
it or another).  The plug-in capabilities certainly make it easier to add 
custom stuff, but it's a lot harder to fix some assumptions made about the 
source tree layout and maven usage (of course).

(Weirdly, this is the second time that having some generic bits that the entire 
ecosystem could leverage has come up with some major rewrite I've undertaken. 
Roman suggested pulling the base shell scripts out of Hadoop and forming a 
completely separate project!)

Anyway, for better or worse, HBase probably has the most customized out of all 
of them, based upon my quick pass through.  I'll try to offer guidance where I 
can.  It'll be interesting to see what does/doesn't work, especially with the 
new framework. The biggest one I'm worried about are the backward compatibility 
bits.  I *hope* override works there, but I haven't had a chance to actually 
test that part ;)

> Update test-patch to leverage rewrite in Hadoop
> ---
>
> Key: HBASE-13525
> URL: https://issues.apache.org/jira/browse/HBASE-13525
> Project: HBase
>  Issue Type: Improvement
>  Components: build
>Reporter: Sean Busbey
>  Labels: jenkins
> Fix For: 2.0.0
>
>
> Once HADOOP-11746 lands over in Hadoop, incorporate its changes into our 
> test-patch. Most likely easiest approach is to start with the Hadoop version 
> and add in the features we have locally that they don't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13231) shell script rewrite

2015-03-13 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HBASE-13231:
-
Attachment: HBASE-13231-donotuse.patch

-donotuse:
* "initial" revision from a few months ago

This is just a 'thought balloon' patch.  It does not work. It is incomplete.  
It is full of embarrassing mistakes. There is a lot of copypasta.  Chances are 
good I'll start over.

It is just to simulate some discussion and generate some ideas of the things 
we'd like to see different.

> shell script rewrite
> 
>
> Key: HBASE-13231
> URL: https://issues.apache.org/jira/browse/HBASE-13231
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
> Attachments: HBASE-13231-donotuse.patch
>
>
> This JIRA is for updating the HBase bash scripts to something remotely 
> modern. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13231) shell script rewrite

2015-03-13 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HBASE-13231:
-
Description: 
This JIRA is for updating the HBase bash scripts to something remotely modern. 


  was:
This JIRA is for updating the HBase shell code to something remotely modern. 



> shell script rewrite
> 
>
> Key: HBASE-13231
> URL: https://issues.apache.org/jira/browse/HBASE-13231
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> This JIRA is for updating the HBase bash scripts to something remotely 
> modern. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13231) shell script rewrite

2015-03-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360707#comment-14360707
 ] 

Allen Wittenauer commented on HBASE-13231:
--

Just the bash code.  I'll remove the shell component. 

Thanks!

> shell script rewrite
> 
>
> Key: HBASE-13231
> URL: https://issues.apache.org/jira/browse/HBASE-13231
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> This JIRA is for updating the HBase shell code to something remotely modern. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13231) shell script rewrite

2015-03-13 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HBASE-13231:
-
Component/s: (was: shell)

> shell script rewrite
> 
>
> Key: HBASE-13231
> URL: https://issues.apache.org/jira/browse/HBASE-13231
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> This JIRA is for updating the HBase shell code to something remotely modern. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13231) shell script rewrite

2015-03-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360670#comment-14360670
 ] 

Allen Wittenauer commented on HBASE-13231:
--

errr, "not near a HBase expert". Woops. haha.

> shell script rewrite
> 
>
> Key: HBASE-13231
> URL: https://issues.apache.org/jira/browse/HBASE-13231
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts, shell
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> This JIRA is for updating the HBase shell code to something remotely modern. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13231) shell script rewrite

2015-03-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360661#comment-14360661
 ] 

Allen Wittenauer commented on HBASE-13231:
--


About a year ago, [~apurtell] and I were talking about HADOOP-9902 and how the 
work was progressing.  He mentioned that the HBase scripts were based on 
-ancient tomes- the Hadoop shell scripts and that it would be nice to see them 
rewritten as well.  After many -beers- objective points to back his position, 
he convinced me that I should probably take a look and work on them. 

It took a while to finish a big chunk of the work in Hadoop, followed on with 
more work as bugs and new ideas popped up (HADOOP-11010) . With the help of a 
lot of other folks after the base work was done, that work as mostly slowed 
down to a very stable state, with just a few things to finish up (minus unit 
test).  

A few months back, I started to see what the state of the HBase code actually 
was.  Again after many -beers- hours of deep analysis, I did a bit of playing 
around, using the Hadoop code as a base.  I had some basic stuff, but hit a few 
pot holes esp when it came to bw compat.  I sort of put things on hold as HBase 
1.0 had shipped and other, non-Apache stuff floated to the top.

 [~busbey] knew I was working on said scripts off & on over the past few months 
and suggested I open this JIRA so that he could -hold something over my head- 
potentially get something for 1.1 or (more likely) 2.0.  

I need to do some cleanup, but I'll try and post what I have thus far.  It's in 
an incomplete state (read: not usable), but it will give the community a sense 
of what I think the direction should probably be.  Feedback is always great, 
esp if I've done something completely idiotic. Just bear in mind I'm near a 
HBase expert so there is a very high probably of that occurring.

Just to expectation set: when it comes to this type of thing, compatibility is 
usually a secondary concern, with future capabilities and ease of using usually 
more primary.  In the case of Hadoop, I estimate it is around 80-90% backward 
compat with lots of things triggering deprecation warnings.   Of course, the 
community ultimately decides but I wanted to throw that out there. 

> shell script rewrite
> 
>
> Key: HBASE-13231
> URL: https://issues.apache.org/jira/browse/HBASE-13231
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts, shell
>Affects Versions: 2.0.0
>Reporter: Allen Wittenauer
>
> This JIRA is for updating the HBase shell code to something remotely modern. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13231) shell script rewrite

2015-03-13 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HBASE-13231:


 Summary: shell script rewrite
 Key: HBASE-13231
 URL: https://issues.apache.org/jira/browse/HBASE-13231
 Project: HBase
  Issue Type: New Feature
  Components: scripts, shell
Affects Versions: 2.0.0
Reporter: Allen Wittenauer


This JIRA is for updating the HBase shell code to something remotely modern. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11534) Remove broken JAVA_HOME autodetection in hbase-config.sh

2014-07-17 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064985#comment-14064985
 ] 

Allen Wittenauer commented on HBASE-11534:
--

FWIW, I'm a big fan of "Let the installer figure this out."  i.e., this is one 
place where having a distribution (bigtop or otherwise) is ideal because they 
can get away with taking a lot of time to configure and tune the system as a 
one-time operation.  Taking the hit every time is...excessive.

Anyway, kudos for fixing this.

> Remove broken JAVA_HOME autodetection in hbase-config.sh
> 
>
> Key: HBASE-11534
> URL: https://issues.apache.org/jira/browse/HBASE-11534
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Esteban Gutierrez
>Priority: Minor
> Fix For: 0.99.0, 0.96.3, 0.98.5, 0.94.22, 2.0.0
>
> Attachments: HBASE-11534.patch
>
>
> [~aw] mentioned on Twitter that the old JAVA_HOME autodetection script we 
> have in hbase-config.sh is very unlikely to do the right thing now. Rip it 
> out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)