[jira] [Commented] (HADOOP-12857) Rework hadoop-tools

2016-03-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209247#comment-15209247
 ] 

Hudson commented on HADOOP-12857:
-

FAILURE: Integrated in Hadoop-trunk-Commit #9492 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9492/])
HADOOP-12857. rework hadoop-tools (aw) (aw: rev 
738155063e6fa3f1811e2e875e2e9611f35ef423)
* hadoop-tools/hadoop-sls/src/main/bin/slsrun.sh
* hadoop-tools/hadoop-streaming/pom.xml
* hadoop-tools/hadoop-archive-logs/pom.xml
* hadoop-tools/hadoop-datajoin/pom.xml
* hadoop-common-project/hadoop-common/src/main/conf/hadoop-env.sh
* hadoop-tools/hadoop-aws/pom.xml
* hadoop-tools/hadoop-rumen/pom.xml
* hadoop-tools/hadoop-sls/src/main/bin/rumen2sls.sh
* hadoop-tools/hadoop-azure/src/site/markdown/index.md
* hadoop-tools/hadoop-openstack/src/site/markdown/index.md
* hadoop-tools/hadoop-sls/pom.xml
* hadoop-tools/hadoop-azure/pom.xml
* hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
* 
hadoop-common-project/hadoop-common/src/test/scripts/hadoop_add_to_classpath_tools.bats
* dev-support/bin/dist-tools-hooks-maker
* hadoop-common-project/hadoop-common/src/main/bin/hadoop-layout.sh.example
* hadoop-common-project/hadoop-common/src/test/scripts/hadoop_shellprofile.bats
* hadoop-common-project/hadoop-common/src/main/bin/hadoop
* hadoop-common-project/hadoop-common/src/test/scripts/hadoop_entry_tests.bats
* hadoop-tools/hadoop-gridmix/pom.xml
* hadoop-tools/hadoop-openstack/pom.xml
* hadoop-common-project/hadoop-common/src/test/scripts/hadoop_bootstrap.bats
* hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
* hadoop-tools/hadoop-extras/pom.xml
* hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh
* 
hadoop-common-project/hadoop-common/src/test/scripts/hadoop_add_to_classpath_toolspath.bats
* hadoop-tools/hadoop-archives/pom.xml
* hadoop-tools/hadoop-kafka/pom.xml
* hadoop-yarn-project/hadoop-yarn/bin/yarn
* hadoop-tools/hadoop-distcp/pom.xml
* hadoop-mapreduce-project/bin/mapred
* hadoop-common-project/hadoop-common/src/test/scripts/hadoop_basic_init.bats
* hadoop-dist/pom.xml


> Rework hadoop-tools
> ---
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Fix For: 3.0.0
>
> Attachments: HADOOP-12857.00.patch, HADOOP-12857.01.patch, 
> HADOOP-12857.02.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools

2016-03-23 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208773#comment-15208773
 ] 

Ravi Prakash commented on HADOOP-12857:
---

+1. LGTM. Please feel free to commit.

> Rework hadoop-tools
> ---
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12857.00.patch, HADOOP-12857.01.patch, 
> HADOOP-12857.02.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools

2016-03-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199458#comment-15199458
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

bq. I wish there was a way to dynamically add subcommands to hadoop, mapred, 
etc, but the code just isn't quite there yet. We can do usage now, but not 
actually execution.

I know how to do this now in a very clean way that would even allow 3rd parties 
to add subcommands to the shell commands.  It is definitely complimentary to 
this patch, but I'll wait for this one to get committed before taking that on 
since it's a much bigger patch.



> Rework hadoop-tools
> ---
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12857.00.patch, HADOOP-12857.01.patch, 
> HADOOP-12857.02.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools

2016-03-10 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189837#comment-15189837
 ] 

Allen Wittenauer commented on HADOOP-12857:
---


Manual run without unit tests:

| Vote |  Subsystem |  Runtime   | Comment

|   0  |reexec  |  0m 21s| Docker mode activated. 
|   0  | shelldocs  |  0m 4s | Shelldocs was not available. 
|  +1  |   @author  |  0m 0s | The patch does not contain any @author 
|  ||| tags.
|  +1  |test4tests  |  0m 0s | The patch appears to include 6 new or 
|  ||| modified test files.
|   0  |mvndep  |  3m 48s| Maven dependency ordering for branch 
|  +1  |mvninstall  |  11m 42s   | trunk passed 
|  +1  |   compile  |  13m 16s   | trunk passed 
|  +1  |   mvnsite  |  12m 55s   | trunk passed 
|  +1  |mvneclipse  |  2m 37s| trunk passed 
|  +1  |   javadoc  |  10m 23s   | trunk passed 
|   0  |mvndep  |  0m 20s| Maven dependency ordering for patch 
|  +1  |mvninstall  |  23m 27s   | the patch passed 
|  +1  |   compile  |  11m 19s   | the patch passed 
|  +1  | javac  |  11m 19s   | the patch passed 
|  +1  |   mvnsite  |  13m 4s| the patch passed 
|  +1  |mvneclipse  |  0m 57s| the patch passed 
|  +1  |shellcheck  |  0m 7s | The applied patch generated 0 new + 94 
|  ||| unchanged - 5 fixed = 94 total (was 99)
|  +1  |whitespace  |  0m 0s | Patch has no whitespace issues. 
|  +1  |   xml  |  0m 5s | The patch has no ill-formed XML file. 
|  +1  |   javadoc  |  10m 53s   | the patch passed 
|  +1  |asflicense  |  0m 24s| Patch does not generate ASF License 
|  ||| warnings.
|  ||  116m 18s  | 

Manually running unit tests:
{code}
 hadoop-common-project/hadoop-common$ mvn test -DskipTests -Pshelltest

[INFO] --- maven-antrun-plugin:1.7:run (common-test-bats-driver) @ 
hadoop-common ---
[INFO] Executing tasks

main:
 [exec] Running bats -t hadoop_add_classpath.bats
 [exec] 1..11
 [exec] ok 1 hadoop_add_classpath (simple not exist)
 [exec] ok 2 hadoop_add_classpath (simple wildcard not exist)
 [exec] ok 3 hadoop_add_classpath (simple exist)
 [exec] ok 4 hadoop_add_classpath (simple wildcard exist)
 [exec] ok 5 hadoop_add_classpath (simple dupecheck)
 [exec] ok 6 hadoop_add_classpath (default order)
 [exec] ok 7 hadoop_add_classpath (after order)
 [exec] ok 8 hadoop_add_classpath (before order)
 [exec] ok 9 hadoop_add_classpath (simple dupecheck 2)
 [exec] ok 10 hadoop_add_classpath (dupecheck 3)
 [exec] ok 11 hadoop_add_classpath (complex ordering)
 [exec] Running bats -t hadoop_add_colonpath.bats
 [exec] 1..9
 [exec] ok 1 hadoop_add_colonpath (simple not exist)
 [exec] ok 2 hadoop_add_colonpath (simple exist)
 [exec] ok 3 hadoop_add_colonpath (simple dupecheck)
 [exec] ok 4 hadoop_add_colonpath (default order)
 [exec] ok 5 hadoop_add_colonpath (after order)
 [exec] ok 6 hadoop_add_colonpath (before order)
 [exec] ok 7 hadoop_add_colonpath (simple dupecheck 2)
 [exec] ok 8 hadoop_add_colonpath (dupecheck 3)
 [exec] ok 9 hadoop_add_colonpath (complex ordering)
 [exec] Running bats -t hadoop_add_common_to_classpath.bats
 [exec] 1..3
 [exec] ok 1 hadoop_add_common_to_classpath (negative)
 [exec] ok 2 hadoop_add_common_to_classpath (positive)
 [exec] ok 3 hadoop_add_common_to_classpath (build paths)
 [exec] Running bats -t hadoop_add_javalibpath.bats
 [exec] 1..9
 [exec] ok 1 hadoop_add_javalibpath (simple not exist)
 [exec] ok 2 hadoop_add_javalibpath (simple exist)
 [exec] ok 3 hadoop_add_javalibpath (simple dupecheck)
 [exec] ok 4 hadoop_add_javalibpath (default order)
 [exec] ok 5 hadoop_add_javalibpath (after order)
 [exec] ok 6 hadoop_add_javalibpath (before order)
 [exec] ok 7 hadoop_add_javalibpath (simple dupecheck 2)
 [exec] ok 8 hadoop_add_javalibpath (dupecheck 3)
 [exec] ok 9 hadoop_add_javalibpath (complex ordering)
 [exec] Running bats -t hadoop_add_ldlibpath.bats
 [exec] 1..9
 [exec] ok 1 hadoop_add_ldlibpath (simple not exist)
 [exec] ok 2 hadoop_add_ldlibpath (simple exist)
 [exec] ok 3 hadoop_add_ldlibpath (simple dupecheck)
 [exec] ok 4 hadoop_add_ldlibpath (default order)
 [exec] ok 5 hadoop_add_ldlibpath (after order)
 [exec] ok 6 hadoop_add_ldlibpath (before order)
 [exec] ok 7 hadoop_add_ldlibpath (simple dupecheck 2)
 [exec] ok 8 hadoop_add_ldlibpath (dupecheck 3)
 [exec] ok 9 hadoop_add_ldlibpath (complex ordering)
 [exec] Running bats -t hadoop_add_param.bats
 [exec] 1..4
  

[jira] [Commented] (HADOOP-12857) Rework hadoop-tools

2016-03-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178429#comment-15178429
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

-01:
* combination of '80-'83 subtasks
* fixes a few shellcheck errors (including reworking document_optionals)
* fixes a few whitespace errors
* full path now used for optional components
* hadoop-openstack dependency plugin wasn't getting configured
* set the mode on hadoop-layout.sh.example

(I won't be submitting this to Jenkins.)

> Rework hadoop-tools
> ---
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12857.00.patch, HADOOP-12857.01.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools

2016-03-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178415#comment-15178415
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

OK, (manually) broken up into 4 patches that should be Yetus friendly.  Also 
fixed the above error and all known whitespace and shellcheck errors.  I'll 
upload a combined version as -01 here.

> Rework hadoop-tools
> ---
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12857.00.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-03-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178297#comment-15178297
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

bq. 728m

lol  so the results are completely inaccurate.

> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12857.00.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178292#comment-15178292
 ] 

Hadoop QA commented on HADOOP-12857:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 4s 
{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 2s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 4m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 42s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 13m 
55s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 4m 
30s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} shellcheck {color} | {color:red} 0m 12s 
{color} | {color:red} The applied patch generated 6 new + 99 unchanged - 0 
fixed = 105 total (was 99) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 6s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 12m 
42s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 41s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 13s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 2s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 27s 
{color} | {color:green} hadoop-streaming in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 26s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s 
{color} | {color:green} hadoop-archives in the patch passed with JDK v1.8.0_72. 
{color} |
| 

[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-03-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178186#comment-15178186
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

I'll break this up into four chunks so that jenkins timeout doesn't impact the 
patch. :(

> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12857.00.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-03-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177860#comment-15177860
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

Argh. Patch is too big for Jenkins to handle.

> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12857.00.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-03-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177353#comment-15177353
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

OK, so I'm not imagining it.  Thanks!

FYI, -00 has a dumb bug that you won't see if you run the optional bits from 
the HADOOP_PREFIX dir. Grr.  (The profiles aren't getting build with the 
HADOOP_TOOLS_HOME in the path.)

I'll wait to see what yetus has to say before posting a new patch though.  I'm 
sure there are whitespace and other issues lol.

> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12857.00.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-03-02 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177339#comment-15177339
 ] 

Chris Nauroth commented on HADOOP-12857:


bq. Why does the hdfs haadmin command require hadoop-tools in the classpath? Is 
this actually a long standing bug/misunderstanding of where toolrunner comes 
from?

I looked through revision history, and it appears that it was always this way, 
right from the old HDFS-1623 feature branch.  I can't think of any good reason 
for it to do this, so I think it's a bug.

> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12857.00.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-03-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177293#comment-15177293
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

-00:

 tl;dr:  HADOOP_TOOLS_PATH is no longer used in the codebase

* removed toolspath from haadmin because I can't see what it needs from there 
and mvn dependencies don't list anything either
* added various HADOOP_TOOLS_* vars to locate content, similar to what is 
present for the other parts of Hadoop
* added those entries to the various envvars subcommands
* added the necessary hooks to build profiles and built-ins
* changed all of the built-ins to use the specific hooks for them at runtime
* added generic *_entry handlers to deal with comma delimited options
* added ability to turn on built-in optional components from hadoop-env.sh 
without doing anything crazy
* added and modified quite a few shell unit tests to test all this code
* added commons-httpclient back to openstack so I could move forward (see 
HADOOP-12868)

Todo:
* need to update the docs for S3, etc, to tell how to turn them on now

> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12857.00.patch
>
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-03-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177090#comment-15177090
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

Why does the hdfs haadmin command require hadoop-tools in the classpath?  Is 
this actually a long standing bug?

> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-03-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176413#comment-15176413
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

bq. Would it make sense to leave these alone as special cases for now and defer 
improving them to a separate patch? I think the primary benefit of this 
proposal is improved manageability of the truly optional components.

Two things lead me to the answer no:

a) More than half of the bits in hadoop-tools are being called by a script.  (I 
know! it's way more than I expected!)  The optional components are in the 
minority.

b) We'll definitely end up with duplicate jars in the classpath for those bits. 
 (The classpath de-duper doesn't expand the asterisks.) 

But really, it's not that much extra just do it in one pass.  I'll likely have 
a patch in the next day or so. (ofc, being unemployed helps haha)

> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-03-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176220#comment-15176220
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

I have some sample code working.  It was very enlightening and I know what to 
do now.  If we really do want to keep one directory, here's my current plan of 
attack:

* Truly optional components (s3, azure, openstack, kafka, etc), will have a 
shellprofile built that users can enable by doing the necessary incantations.  
I'm currently thinking I might be able to add content to hadoop-env.sh at build 
time to actually turn these things on via a single env-var setting or one per 
feature. No promises.  (Yes, I'm currently looking for my "Black Hat of Bash 
Wizardry" to make this happen.) Worst case, it'll be a "copy and rename to 
HADOOP_CONF_DIR".

* With some help from [~raviprak] to make me see the forest for the trees, I 
can now build shell parse-able dependency lists at build time.  I have two ways 
I can process this:  I can either store these lists in the hadoop-dist target 
directory or in the target directory of the actually tools+using a 
well-known-name+find to build the necessary shell magic at build time.  I'm 
leaning towards the latter since that will allow mvn clean to work in 
hadoop-dist in an expected way, since there won't be a hidden dependency on 
hadoop-tools having been run before the mvn package.

* distch, distcp, archive-logs, etc, are extremely problematic. Using shell 
profiles for these WILL NOT WORK since they a) aren't really optional and b) 
removing them from the command line tools won't really help anyone.  Currently 
these commands load all of HADOOP_TOOLS_PATH which is awful. I want to add to 
libexec/ a tools directory that stores helper functions for tools jars that are 
required for the various subcommands.  It will use similar but different code 
from the optional components.  It will key off a different filename for the 
dependency list and there will need to be a contract between the helper 
function names and the dependency file name.  (This sounds worse than what it 
is.) 

I *wish* there was a way to dynamically add subcommands to hadoop, mapred, etc, 
but the code just isn't quite there yet.  We can do usage now, but not actually 
execution.

One big question: How should this work proceed?
# Single patch
# Multiple patches with a strict commit dependency order
# Separate branch followed by a branch merge

Given this work will likely be all or nothing I'm not a fan of multiple patches.

> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-02-29 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172799#comment-15172799
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

FWIW, I've got some stupid/simple shell code that takes the output of mvn 
dependency:list and builds a shell profile script.

Some random notes:

* It currently looks for *ALL* of the jars in the tools dir.  This is less than 
efficient for what are hopefully obvious reasons.
* HADOOP-10115 pretty much means that the shell profiles will need to be built 
well after we've processed the hadoop-tools dir in order to know what is/isn't 
already bundled via hadoop-common.


So contemplating two approaches in order to make the latter option work:

# Try to trigger mvn dependency:list in the build stage for those modules that 
need it.  Push the output through the build process up until hadoop-dist gets 
triggered. Take that output and generate the profiles then.
# In hadoop-dist, run mvn dependency:list for all (except some blacklisted 
ones) modules under hadoop-tools (and thus effectively having mvn running mvn), 
and then generate profiles as in #1.

To make matters more complicated, I've been informed over the weekend that Big 
Top based distributions stupidly merge all of hadoop-tools into hadoop-common's 
lib dir.  So they'll always have the perf hit and other issues that having a 
flat dir structure causes.

> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-02-29 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172764#comment-15172764
 ] 

Chris Nauroth commented on HADOOP-12857:


Just copy-pasting my comment from HADOOP-12556:

I'm slightly in favor of option 2: keep it as one big dir.  That gives an easy 
out if someone decides they really do want the whole world by putting 
share/hadoop/tools/lib/* on the classpath.  OTOH, I suppose we could come up 
with a "whole world" shell profile that walks a more granular directory 
structure and gathers everything.

In general, I really like the idea of using shell profiles to solve this 
problem.  We still have a gap in that we don't have equivalent functionality on 
Windows.  I have a hunch that it won't be feasible to offer all of the rich 
features of the full shell rewrite in cmd, but maybe we can do just enough to 
support classpath customization through profiles.


> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is starting to become a big burden as 
> the number of tools grows.  Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist

2016-02-29 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172744#comment-15172744
 ] 

Allen Wittenauer commented on HADOOP-12857:
---

There have been several jiras as of late (e.g, HADOOP-12556 , HADOOP-12721 just 
to name two) where there is growing concern about ease of use vs. performance 
vs. surprises. in HADOOP-12556 I offered up two potential solutions to the 
issue:

* break apart hadoop-tools-dist into multiple directories. create a shell 
profile that sucks that functionalities entire dir.
* keep hadoop-tools-dist as one big dir (thus making it bw compat, but still 
potentially messy). build a tool that creates shell profiles based upon the 
maven dependency trees to list the specific jars needed by that functionality.


> Rework hadoop-tools-dist
> 
>
> Key: HADOOP-12857
> URL: https://issues.apache.org/jira/browse/HADOOP-12857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a 
> single directory that gets sucked in is staring to become a big burden.  
> Let's rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)