Re: Where to put some code examples?

2015-06-24 Thread Ray Chiang
Thanks, dev-support sounds good.  The only question I have is that there
isn't a pom.xml there now.  Is that something we'd want to have there?  And
should it at least be linked to the main build via some option, like -Pdev?

-Ray


On Tue, Jun 23, 2015 at 10:04 PM, Jay Vyas jayunit100.apa...@gmail.com
wrote:

 Also if they are general Hadoop big data examples were happy to carry them
 in bigtop as well ... Especially if they touch multiple areas of the Hadoop
 ecosystem

  On Jun 23, 2015, at 11:56 PM, Andrew Wang andrew.w...@cloudera.com
 wrote:
 
  Yea, throw them under dev-support. It'd be good to link them up on the
 wiki
  or website too so they're more findable.
 
  Related, we could also use a README in dev-support as an overview of what
  everything does.
 
  On Tue, Jun 23, 2015 at 8:19 PM, Sean Busbey bus...@cloudera.com
 wrote:
 
  Could they go under dev-support?
 
  On Tue, Jun 23, 2015 at 4:29 PM, Ray Chiang rchi...@cloudera.com
 wrote:
 
  So, as far as I can see, Hadoop has the main developer area for core
  Hadoop
  code, unit tests in the test directories, user scripts (like
  hadoop/mapred/yarn), and build scripts.
 
  I've got some utilities that are really for Hadoop contributors.  These
  serve two purposes:
 
1. These are just generally useful as private API examples
2. They have some utility for developer purposes (e.g. the random
  .jhist
generator I'm working on for MAPREDUCE-6376)
 
  Does anyone have suggestions for where such code bits (and possibly
  corresponding scripts) should go?
 
  -Ray
 
 
 
  --
  Sean
 



[jira] [Created] (HADOOP-12115) Document additional native build dependencies in BUILDING.txt

2015-06-24 Thread Kengo Seki (JIRA)
Kengo Seki created HADOOP-12115:
---

 Summary: Document additional native build dependencies in 
BUILDING.txt
 Key: HADOOP-12115
 URL: https://issues.apache.org/jira/browse/HADOOP-12115
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation, native
Reporter: Kengo Seki
Assignee: Kengo Seki


On CentOS 6.6, {code}mvn clean compile -DskipTests -Pnative 
-Drequire.libwebhdfs -Drequire.openssl -Drequire.fuse 
-Drequire.test.libhadoop{code} fails as follows if libcurl-devel is not 
installed, although the build environment satisfies the requirements described 
in BUILDING.txt.

{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project 
hadoop-hdfs: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...exec 
dir=/home/sekikn/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/native 
executable=cmake failonerror=true... @ 5:119 in 
/home/sekikn/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
{code}

Ant error messages are as follows:

{code}
 [exec] CMake Error at 
/usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:108 (message):
 [exec]   Could NOT find CURL (missing: CURL_LIBRARY CURL_INCLUDE_DIR)
 [exec] Call Stack (most recent call first):
 [exec]   /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:315 
(_FPHSA_FAILURE_MESSAGE)
 [exec]   /usr/share/cmake/Modules/FindCURL.cmake:54 
(FIND_PACKAGE_HANDLE_STANDARD_ARGS)
 [exec]   contrib/libwebhdfs/CMakeLists.txt:19 (find_package)
{code}

It should be listed in BUILDING.txt also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12118) Validate xml configuration files with XML Schema

2015-06-24 Thread Christopher Tubbs (JIRA)
Christopher Tubbs created HADOOP-12118:
--

 Summary: Validate xml configuration files with XML Schema
 Key: HADOOP-12118
 URL: https://issues.apache.org/jira/browse/HADOOP-12118
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Christopher Tubbs


I spent an embarrassingly long time today trying to figure out why the 
following wouldn't work.

{code}
property
  keyfs.defaultFS/key
  valuehdfs://localhost:9000/value
/property
{code}

I just kept getting an error about no authority for {{fs.defaultFS}}, with a 
value of {{file:///}}, which made no sense... because I knew it was there.

The problem was that the {{core-site.xml}} was parsed entirely without any 
validation. This seems incorrect. The very least that could be done is a simple 
XML Schema validation against an XSD, before parsing. That way, users will get 
immediate failures on common typos and other problems in the xml configuration 
files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-24 Thread Nigel Daley
+1 for a separate project and going directly to TLP if possible (as Hadoop 
itself did when split out of Nutch)

+1 for having language discussions once it's a TLP :-)

Cheers,
Nigel

 On Jun 22, 2015, at 1:55 PM, Andrew Purtell apurt...@apache.org wrote:
 
 On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe cmcc...@apache.org
 wrote:
 
 You mentioned that most of our project will be focused on shell
 scripts I guess based on the existing test-patch code.  Allen did a
 lot of good work in this area recently.  I am curious if you evaluated
 languages such as Python or Node.js for this use-case.  Shell scripts
 can get a little... tricky beyond a certain size.  On the other hand,
 if we are standardizing on shell, which shell and which version?
 Perhaps bash 3.5+?
 
 I'll also add that shell is not helpful for a cross-platform set of
 tooling. I recently added a daemon to Apache Phoenix; an explicit
 requirement was Windows support. I ended up implementing a solution in
 python because that environment is platform-agnostic and still systems-y
 enough. I think this is something this project should seriously consider.
 
 In my opinion, historically, test-patch hasn't needed to be cross platform
 because the only first class development environment for Hadoop has been
 Linux. Growing beyond this could absolutely be one focus of Yetus should
 that be a consensus goal of the community. The seed of the project, though,
 is today's test-patch, which is implemented in bash. That's where we are
 today. Language discussions (smile) can and should be forward looking.
 
 
 On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe cmcc...@apache.org
 wrote:
 
 You mentioned that most of our project will be focused on shell
 scripts I guess based on the existing test-patch code.  Allen did a
 lot of good work in this area recently.  I am curious if you evaluated
 languages such as Python or Node.js for this use-case.  Shell scripts
 can get a little... tricky beyond a certain size.  On the other hand,
 if we are standardizing on shell, which shell and which version?
 Perhaps bash 3.5+?
 
 I'll also add that shell is not helpful for a cross-platform set of
 tooling. I recently added a daemon to Apache Phoenix; an explicit
 requirement was Windows support. I ended up implementing a solution in
 python because that environment is platform-agnostic and still systems-y
 enough. I think this is something this project should seriously consider.
 
 -n
 
 On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey bus...@cloudera.com wrote:
 I'm going to try responding to several things at once here, so
 apologies
 if
 I miss anyone and sorry for the long email. :)
 
 
 On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran 
 ste...@hortonworks.com
 wrote:
 
 I think it's good to have a general build/test process projects can
 share,
 so +1 to pulling it out. You should get help from others.
 
 regarding incubation, it is a lot of work, especially for something
 that's
 more of an in-house tool than an artifact to release and redistribute.
 
 You can't just use apache labs or the build project's repo to work on
 this?
 
 if you do want to incubate, we may want to nominate the hadoop project
 as
 the monitoring PMC, rather than incubator@.
 
 -steve
 Important note: we're proposing a board resolution that would directly
 pull
 this code base out into a new TLP; there'd be no incubator, we'd just
 continue building community and start making releases.
 
 The proposed PMC believes the tooling we're talking about has direct
 applicability to projects well outside of the ASF. Lot's of other open
 source projects run on community contributions and have a general need
 for
 better QA tools. Given that problem set and the presence of a community
 working to solve it, there's no reason this needs to be treated as an
 in-house build project. We certainly want to be useful to ASF projects
 and
 getting them on-board given our current optimization for ASF infra will
 certainly be easier, but we're not limited to that (and our current
 prerequisites, a CI tool and jira or github, are pretty broadly
 available).
 
 
 On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk ndimi...@apache.org
 wrote:
 
 
 Since we're tossing out names, how about Apache Bootstrap? It's a
 meta-project to help other projects get off the ground, after all.
 
 
 There's already a web development framework named Bootstrap[1]. It's
 also
 used by several ASF projects, so I think it best to avoid the
 confusion.
 
 The name is, of course, up to the proposed PMC. As a bit of background,
 the
 current name Yetus fulfills Allen's desire to have something shell
 related
 and my desire to have a project that starts with Y (there are currently
 no
 ASF projects that start with Y). The universe of names that fill in
 these
 two is very small, AFAICT. I did a brief 

[jira] [Resolved] (HADOOP-11915) test-patch.sh should be documented

2015-06-24 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-11915.
---
Resolution: Implemented

 test-patch.sh should be documented
 --

 Key: HADOOP-11915
 URL: https://issues.apache.org/jira/browse/HADOOP-11915
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Allen Wittenauer

 It might be useful to have all of test-patch.sh's functionality documented, 
 how to use it, power user hints, etc. (esp for the bug bash... )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)

2015-06-24 Thread Sean Busbey
Hi Folks!

Work in a feature branch is now being tracked by HADOOP-12111.

On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey bus...@cloudera.com wrote:

 It looks like we have consensus.

 I'll start drafting up a proposal for the next board meeting (July 15th).
 Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
 that we did due diligence on whatever we pick.

 In the mean time, Hadoop PMC would y'all be willing to host us in a branch
 so that we can start prepping things now? We would want branch commit
 rights for the proposed new PMC.


 -Sean


 On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey bus...@cloudera.com wrote:

 Oof. I had meant to push on this again but life got in the way and now
 the June board meeting is upon us. Sorry everyone. In the event that this
 ends up contentious, hopefully one of the copied communities can give us a
 branch to work in.

 I know everyone is busy, so here's the short version of this email: I'd
 like to move some of the code currently in Hadoop (test-patch) into a new
 TLP focused on QA tooling. I'm not sure what the best format for priming
 this conversation is. ORC filled in the incubator project proposal
 template, but I'm not sure how much that confused the issue. So to start,
 I'll just write what I'm hoping we can accomplish in general terms here.

 All software development projects that are community based (that is,
 accepting outside contributions) face a common QA problem for vetting
 in-coming contributions. Hadoop is fortunate enough to be sufficiently
 popular that the weight of the problem drove tool development (i.e.
 test-patch). That tool is generalizable enough that a bunch of other TLPs
 have adopted their own forks. Unfortunately, in most projects this kind of
 QA work is an enabler rather than a primary concern, so often the tooling
 is worked on ad-hoc and little shared improvements happen across projects. 
 Since
 the tooling itself is never a primary concern, any made is rarely reused
 outside of ASF projects.

 Over the last couple months a few of us have been working on generalizing
 the tooling present in the Hadoop code base (because it was the most mature
 out of all those in the various projects) and it's reached a point where we
 think we can start bringing on other downstream users. This means we need
 to start establishing things like a release cadence and to grow the new
 contributors we have to handle more project responsibility. Personally, I
 think that means it's time to move out from under Hadoop to drive things as
 our own community. Eventually, I hope the community can help draw in a
 group of folks traditionally underrepresented in ASF projects, namely QA
 and operations folks.

 I think test-patch by itself has enough scope to justify a project.
 Having a solid set of build tools that are customizable to fit the norms of
 different software communities is a bunch of work. Making it work well in
 both the context of automated test systems like Jenkins and for individual
 developers is even more work. We could easily also take over maintenance of
 things like shelldocs, since test-patch is the primary consumer of that
 currently but it's generally useful tooling.

 In addition to test-patch, I think the proposed project has some future
 growth potential. Given some adoption of test-patch to prove utility, the
 project could build on the ties it makes to start building tools to help
 projects do their own longer-run testing. Note that I'm talking about the
 tools to build QA processes and not a particular set of tested components.
 Specifically, I think the ChaosMonkey work that's in HBase should be
 generalizable as a fault injection framework (either based on that code or
 something like it). Doing this for arbitrary software is obviously very
 difficult, and a part of easing that will be to make (and then favor)
 tooling to allow projects to have operational glue that looks the same.
 Namely, the shell work that's been done in hadoop-functions.sh would be a
 great foundational layer that could bring good daemon handling practices to
 a whole slew of software projects. In the event that these frameworks and
 tools get adopted by parts of the Hadoop ecosystem, that could make the job
 of i.e. Bigtop substantially easier.

 I've reached out to a few folks who have been involved in the current
 test-patch work or expressed interest in helping out on getting it used in
 other projects. Right now, the proposed PMC would be (alphabetical by last
 name):

 * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
 pmc, sqoop pmc, all around Jenkins expert)
 * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
 * Nick Dimiduk (hbase pmc, phoenix pmc)
 * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
 * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
 phoenix pmc)
 * Allen Wittenauer (hadoop committer)

 That PMC gives us several members and a bunch of folks familiar with the
 

Re: Pre-integration tests failing

2015-06-24 Thread Sean Busbey
On Tue, Jun 23, 2015 at 10:43 PM, Alan Burlison alan.burli...@oracle.com
wrote:

 On 24/06/2015 04:22, Sean Busbey wrote:

  Probably not (barring maven attempting to grab SNAPSHOT versions of other
 modules while building).

 What are the machine specs like? the complete unit test set requires a
 fair
 bit of machine power (i.e. more than my laptop can handle).


 The Linux machine is pretty old, it's a 4-core Opteron with 8Gb mem. I
 haven't attempted test runs on Solaris yet as I know they won't complete
 successfully.


I would try things out on a heavier machine then. I know that I've gotten
clean test runs when using a proper server, but never have on my 2 core /
8GB mem laptop.

This is an area where we could do a better job of setting expectations for
contributors, but I'm not sure we have good stats about what kind of build
hardware is needed for a full build. Hopefully it's less than the H*
builds.apache machines. :)


-- 
Sean


[jira] [Created] (HADOOP-12116) Fix unrecommended syntax usages in hadoop/hdfs/yarn script for cygwin in branch-2

2015-06-24 Thread Li Lu (JIRA)
Li Lu created HADOOP-12116:
--

 Summary: Fix unrecommended syntax usages in hadoop/hdfs/yarn 
script for cygwin in branch-2
 Key: HADOOP-12116
 URL: https://issues.apache.org/jira/browse/HADOOP-12116
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu


We're using syntax like if $cygwin; then which may be errorounsly evaluated 
into true if cygwin is unset. We need to fix this in branch-2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Where to put some code examples?

2015-06-24 Thread Sean Busbey
On Wed, Jun 24, 2015 at 2:10 AM, Ray Chiang rchi...@cloudera.com wrote:

 Thanks, dev-support sounds good.  The only question I have is that there
 isn't a pom.xml there now.  Is that something we'd want to have there?  And
 should it at least be linked to the main build via some option, like -Pdev?


I was working under the assumption that they'd be independent project poms
that a dev would have to actively change directories to use.

If they're hooked into the main build, I'd say add a new module instead. We
already have a few foo-examples modules, so maybe flag it as
foo-internal-examples to distinguish from things downstream users should
be looking to for guidance.

-- 
Sean


Re: Where to put some code examples?

2015-06-24 Thread Allen Wittenauer

On Jun 24, 2015, at 10:03 AM, Sean Busbey bus...@cloudera.com wrote:

 On Wed, Jun 24, 2015 at 2:10 AM, Ray Chiang rchi...@cloudera.com wrote:
 
 Thanks, dev-support sounds good.  The only question I have is that there
 isn't a pom.xml there now.  Is that something we'd want to have there?  And
 should it at least be linked to the main build via some option, like -Pdev?
 
 
 I was working under the assumption that they'd be independent project poms
 that a dev would have to actively change directories to use.
 
 If they're hooked into the main build, I'd say add a new module instead. We
 already have a few foo-examples modules, so maybe flag it as
 foo-internal-examples to distinguish from things downstream users should
 be looking to for guidance.

Agreed: dev-support is probably the wrong place. That's typically where 
things that help make the build, build tend to go.

There are some examples and samples sprinkled throughout the code base. 
 Plus there is always hadoop-tools as a fallback, which is sort of the 'junk 
drawer' for these sorts of things.  IMO, random .jhist generator sounds like 
a thing that should go into hadoop-tools.

Re: 2.7.1 status

2015-06-24 Thread Vinod Kumar Vavilapalli
With a bit of effort from a bunch of contributors/committers, we are finally 
down to zero blocker/critical issues.

Unless, the situation changes, I’ll roll an RC in a day or two. This time for 
real.

Thanks
+Vinod


 On Jun 15, 2015, at 2:58 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 
 We are down to one blocker and a few critical tickets. I’ll try to push out 
 an RC in a day or two.
 
 Thanks
 +Vinod
 
 
 On Jun 1, 2015, at 10:45 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 
 Tx for the move on that JIRA, folks.
 
 https://issues.apache.org/jira/issues/?filter=12331550 still shows 4 
 blockers /  4 criticals. I'm going to start pushing them in/out.
 
 Thanks
 +Vinod
 
 On May 27, 2015, at 3:20 PM, Chris Nauroth cnaur...@hortonworks.com wrote:
 
 Thanks, Larry.  I have marked HADOOP-11934 as a blocker for 2.7.1.  I have
 reviewed and +1'd it.  I can commit it after we get feedback from Jenkins.
 
 --Chris Nauroth
 
 
 
 
 On 5/26/15, 12:41 PM, larry mccay lmc...@apache.org wrote:
 
 Hi Vinod -
 
 I think that https://issues.apache.org/jira/browse/HADOOP-11934 should
 also
 be added to the blocker list.
 This is a critical bug in our ability to protect the LDAP connection
 password in LdapGroupsMapper.
 
 thanks!
 
 --larry
 
 On Tue, May 26, 2015 at 3:32 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 
 Tx for reporting this, Elliot.
 
 Made it a blocker, not with a deeper understanding of the problem. Can
 you
 please chime in with your opinion and perhaps code reviews?
 
 Thanks
 +Vinod
 
 On May 26, 2015, at 10:48 AM, Elliott Clark ecl...@apache.org wrote:
 
 HADOOP-12001 should probably be added to the blocker list since it's a
 regression that can keep ldap from working.
 
 
 
 
 
 
 



Re: Where to put some code examples?

2015-06-24 Thread Vinod Kumar Vavilapalli
Agreed. Outside of build tools, you should just file tickets so that we can 
figure out the right place for the right thing.

Thanks
+Vinod


On Jun 24, 2015, at 10:11 AM, Allen Wittenauer 
a...@altiscale.commailto:a...@altiscale.com wrote:


On Jun 24, 2015, at 10:03 AM, Sean Busbey 
bus...@cloudera.commailto:bus...@cloudera.com wrote:

On Wed, Jun 24, 2015 at 2:10 AM, Ray Chiang 
rchi...@cloudera.commailto:rchi...@cloudera.com wrote:

Thanks, dev-support sounds good.  The only question I have is that there
isn't a pom.xml there now.  Is that something we'd want to have there?  And
should it at least be linked to the main build via some option, like -Pdev?


I was working under the assumption that they'd be independent project poms
that a dev would have to actively change directories to use.

If they're hooked into the main build, I'd say add a new module instead. We
already have a few foo-examples modules, so maybe flag it as
foo-internal-examples to distinguish from things downstream users should
be looking to for guidance.

Agreed: dev-support is probably the wrong place. That's typically where things 
that help make the build, build tend to go.

There are some examples and samples sprinkled throughout the code base.  Plus 
there is always hadoop-tools as a fallback, which is sort of the 'junk drawer' 
for these sorts of things.  IMO, random .jhist generator sounds like a thing 
that should go into hadoop-tools.



[jira] [Created] (HADOOP-12117) Potential NPE from Configuration#loadProperty with allowNullValueProperties set.

2015-06-24 Thread zhihai xu (JIRA)
zhihai xu created HADOOP-12117:
--

 Summary: Potential NPE from Configuration#loadProperty with 
allowNullValueProperties set.
 Key: HADOOP-12117
 URL: https://issues.apache.org/jira/browse/HADOOP-12117
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu


Potential NPE from Configuration#loadProperty with allowNullValueProperties set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)