Re: Where to put some code examples?
Thanks, dev-support sounds good. The only question I have is that there isn't a pom.xml there now. Is that something we'd want to have there? And should it at least be linked to the main build via some option, like -Pdev? -Ray On Tue, Jun 23, 2015 at 10:04 PM, Jay Vyas jayunit100.apa...@gmail.com wrote: Also if they are general Hadoop big data examples were happy to carry them in bigtop as well ... Especially if they touch multiple areas of the Hadoop ecosystem On Jun 23, 2015, at 11:56 PM, Andrew Wang andrew.w...@cloudera.com wrote: Yea, throw them under dev-support. It'd be good to link them up on the wiki or website too so they're more findable. Related, we could also use a README in dev-support as an overview of what everything does. On Tue, Jun 23, 2015 at 8:19 PM, Sean Busbey bus...@cloudera.com wrote: Could they go under dev-support? On Tue, Jun 23, 2015 at 4:29 PM, Ray Chiang rchi...@cloudera.com wrote: So, as far as I can see, Hadoop has the main developer area for core Hadoop code, unit tests in the test directories, user scripts (like hadoop/mapred/yarn), and build scripts. I've got some utilities that are really for Hadoop contributors. These serve two purposes: 1. These are just generally useful as private API examples 2. They have some utility for developer purposes (e.g. the random .jhist generator I'm working on for MAPREDUCE-6376) Does anyone have suggestions for where such code bits (and possibly corresponding scripts) should go? -Ray -- Sean
[jira] [Created] (HADOOP-12115) Document additional native build dependencies in BUILDING.txt
Kengo Seki created HADOOP-12115: --- Summary: Document additional native build dependencies in BUILDING.txt Key: HADOOP-12115 URL: https://issues.apache.org/jira/browse/HADOOP-12115 Project: Hadoop Common Issue Type: Bug Components: documentation, native Reporter: Kengo Seki Assignee: Kengo Seki On CentOS 6.6, {code}mvn clean compile -DskipTests -Pnative -Drequire.libwebhdfs -Drequire.openssl -Drequire.fuse -Drequire.test.libhadoop{code} fails as follows if libcurl-devel is not installed, although the build environment satisfies the requirements described in BUILDING.txt. {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-hdfs: An Ant BuildException has occured: exec returned: 1 [ERROR] around Ant part ...exec dir=/home/sekikn/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/native executable=cmake failonerror=true... @ 5:119 in /home/sekikn/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml {code} Ant error messages are as follows: {code} [exec] CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:108 (message): [exec] Could NOT find CURL (missing: CURL_LIBRARY CURL_INCLUDE_DIR) [exec] Call Stack (most recent call first): [exec] /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE) [exec] /usr/share/cmake/Modules/FindCURL.cmake:54 (FIND_PACKAGE_HANDLE_STANDARD_ARGS) [exec] contrib/libwebhdfs/CMakeLists.txt:19 (find_package) {code} It should be listed in BUILDING.txt also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12118) Validate xml configuration files with XML Schema
Christopher Tubbs created HADOOP-12118: -- Summary: Validate xml configuration files with XML Schema Key: HADOOP-12118 URL: https://issues.apache.org/jira/browse/HADOOP-12118 Project: Hadoop Common Issue Type: Improvement Reporter: Christopher Tubbs I spent an embarrassingly long time today trying to figure out why the following wouldn't work. {code} property keyfs.defaultFS/key valuehdfs://localhost:9000/value /property {code} I just kept getting an error about no authority for {{fs.defaultFS}}, with a value of {{file:///}}, which made no sense... because I knew it was there. The problem was that the {{core-site.xml}} was parsed entirely without any validation. This seems incorrect. The very least that could be done is a simple XML Schema validation against an XSD, before parsing. That way, users will get immediate failures on common typos and other problems in the xml configuration files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
+1 for a separate project and going directly to TLP if possible (as Hadoop itself did when split out of Nutch) +1 for having language discussions once it's a TLP :-) Cheers, Nigel On Jun 22, 2015, at 1:55 PM, Andrew Purtell apurt...@apache.org wrote: On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk ndimi...@gmail.com wrote: On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe cmcc...@apache.org wrote: You mentioned that most of our project will be focused on shell scripts I guess based on the existing test-patch code. Allen did a lot of good work in this area recently. I am curious if you evaluated languages such as Python or Node.js for this use-case. Shell scripts can get a little... tricky beyond a certain size. On the other hand, if we are standardizing on shell, which shell and which version? Perhaps bash 3.5+? I'll also add that shell is not helpful for a cross-platform set of tooling. I recently added a daemon to Apache Phoenix; an explicit requirement was Windows support. I ended up implementing a solution in python because that environment is platform-agnostic and still systems-y enough. I think this is something this project should seriously consider. In my opinion, historically, test-patch hasn't needed to be cross platform because the only first class development environment for Hadoop has been Linux. Growing beyond this could absolutely be one focus of Yetus should that be a consensus goal of the community. The seed of the project, though, is today's test-patch, which is implemented in bash. That's where we are today. Language discussions (smile) can and should be forward looking. On Mon, Jun 22, 2015 at 1:03 PM, Nick Dimiduk ndimi...@gmail.com wrote: On Mon, Jun 22, 2015 at 12:43 PM, Colin P. McCabe cmcc...@apache.org wrote: You mentioned that most of our project will be focused on shell scripts I guess based on the existing test-patch code. Allen did a lot of good work in this area recently. I am curious if you evaluated languages such as Python or Node.js for this use-case. Shell scripts can get a little... tricky beyond a certain size. On the other hand, if we are standardizing on shell, which shell and which version? Perhaps bash 3.5+? I'll also add that shell is not helpful for a cross-platform set of tooling. I recently added a daemon to Apache Phoenix; an explicit requirement was Windows support. I ended up implementing a solution in python because that environment is platform-agnostic and still systems-y enough. I think this is something this project should seriously consider. -n On Tue, Jun 16, 2015 at 7:55 PM, Sean Busbey bus...@cloudera.com wrote: I'm going to try responding to several things at once here, so apologies if I miss anyone and sorry for the long email. :) On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran ste...@hortonworks.com wrote: I think it's good to have a general build/test process projects can share, so +1 to pulling it out. You should get help from others. regarding incubation, it is a lot of work, especially for something that's more of an in-house tool than an artifact to release and redistribute. You can't just use apache labs or the build project's repo to work on this? if you do want to incubate, we may want to nominate the hadoop project as the monitoring PMC, rather than incubator@. -steve Important note: we're proposing a board resolution that would directly pull this code base out into a new TLP; there'd be no incubator, we'd just continue building community and start making releases. The proposed PMC believes the tooling we're talking about has direct applicability to projects well outside of the ASF. Lot's of other open source projects run on community contributions and have a general need for better QA tools. Given that problem set and the presence of a community working to solve it, there's no reason this needs to be treated as an in-house build project. We certainly want to be useful to ASF projects and getting them on-board given our current optimization for ASF infra will certainly be easier, but we're not limited to that (and our current prerequisites, a CI tool and jira or github, are pretty broadly available). On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk ndimi...@apache.org wrote: Since we're tossing out names, how about Apache Bootstrap? It's a meta-project to help other projects get off the ground, after all. There's already a web development framework named Bootstrap[1]. It's also used by several ASF projects, so I think it best to avoid the confusion. The name is, of course, up to the proposed PMC. As a bit of background, the current name Yetus fulfills Allen's desire to have something shell related and my desire to have a project that starts with Y (there are currently no ASF projects that start with Y). The universe of names that fill in these two is very small, AFAICT. I did a brief
[jira] [Resolved] (HADOOP-11915) test-patch.sh should be documented
[ https://issues.apache.org/jira/browse/HADOOP-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-11915. --- Resolution: Implemented test-patch.sh should be documented -- Key: HADOOP-11915 URL: https://issues.apache.org/jira/browse/HADOOP-11915 Project: Hadoop Common Issue Type: Bug Reporter: Allen Wittenauer It might be useful to have all of test-patch.sh's functionality documented, how to use it, power user hints, etc. (esp for the bug bash... ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
Hi Folks! Work in a feature branch is now being tracked by HADOOP-12111. On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey bus...@cloudera.com wrote: It looks like we have consensus. I'll start drafting up a proposal for the next board meeting (July 15th). Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track that we did due diligence on whatever we pick. In the mean time, Hadoop PMC would y'all be willing to host us in a branch so that we can start prepping things now? We would want branch commit rights for the proposed new PMC. -Sean On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey bus...@cloudera.com wrote: Oof. I had meant to push on this again but life got in the way and now the June board meeting is upon us. Sorry everyone. In the event that this ends up contentious, hopefully one of the copied communities can give us a branch to work in. I know everyone is busy, so here's the short version of this email: I'd like to move some of the code currently in Hadoop (test-patch) into a new TLP focused on QA tooling. I'm not sure what the best format for priming this conversation is. ORC filled in the incubator project proposal template, but I'm not sure how much that confused the issue. So to start, I'll just write what I'm hoping we can accomplish in general terms here. All software development projects that are community based (that is, accepting outside contributions) face a common QA problem for vetting in-coming contributions. Hadoop is fortunate enough to be sufficiently popular that the weight of the problem drove tool development (i.e. test-patch). That tool is generalizable enough that a bunch of other TLPs have adopted their own forks. Unfortunately, in most projects this kind of QA work is an enabler rather than a primary concern, so often the tooling is worked on ad-hoc and little shared improvements happen across projects. Since the tooling itself is never a primary concern, any made is rarely reused outside of ASF projects. Over the last couple months a few of us have been working on generalizing the tooling present in the Hadoop code base (because it was the most mature out of all those in the various projects) and it's reached a point where we think we can start bringing on other downstream users. This means we need to start establishing things like a release cadence and to grow the new contributors we have to handle more project responsibility. Personally, I think that means it's time to move out from under Hadoop to drive things as our own community. Eventually, I hope the community can help draw in a group of folks traditionally underrepresented in ASF projects, namely QA and operations folks. I think test-patch by itself has enough scope to justify a project. Having a solid set of build tools that are customizable to fit the norms of different software communities is a bunch of work. Making it work well in both the context of automated test systems like Jenkins and for individual developers is even more work. We could easily also take over maintenance of things like shelldocs, since test-patch is the primary consumer of that currently but it's generally useful tooling. In addition to test-patch, I think the proposed project has some future growth potential. Given some adoption of test-patch to prove utility, the project could build on the ties it makes to start building tools to help projects do their own longer-run testing. Note that I'm talking about the tools to build QA processes and not a particular set of tested components. Specifically, I think the ChaosMonkey work that's in HBase should be generalizable as a fault injection framework (either based on that code or something like it). Doing this for arbitrary software is obviously very difficult, and a part of easing that will be to make (and then favor) tooling to allow projects to have operational glue that looks the same. Namely, the shell work that's been done in hadoop-functions.sh would be a great foundational layer that could bring good daemon handling practices to a whole slew of software projects. In the event that these frameworks and tools get adopted by parts of the Hadoop ecosystem, that could make the job of i.e. Bigtop substantially easier. I've reached out to a few folks who have been involved in the current test-patch work or expressed interest in helping out on getting it used in other projects. Right now, the proposed PMC would be (alphabetical by last name): * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds pmc, sqoop pmc, all around Jenkins expert) * Sean Busbey (ASF member, accumulo pmc, hbase pmc) * Nick Dimiduk (hbase pmc, phoenix pmc) * Chris Nauroth (ASF member, incubator pmc, hadoop pmc) * Andrew Purtell (ASF member, incubator pmc, bigtop pmc, hbase pmc, phoenix pmc) * Allen Wittenauer (hadoop committer) That PMC gives us several members and a bunch of folks familiar with the
Re: Pre-integration tests failing
On Tue, Jun 23, 2015 at 10:43 PM, Alan Burlison alan.burli...@oracle.com wrote: On 24/06/2015 04:22, Sean Busbey wrote: Probably not (barring maven attempting to grab SNAPSHOT versions of other modules while building). What are the machine specs like? the complete unit test set requires a fair bit of machine power (i.e. more than my laptop can handle). The Linux machine is pretty old, it's a 4-core Opteron with 8Gb mem. I haven't attempted test runs on Solaris yet as I know they won't complete successfully. I would try things out on a heavier machine then. I know that I've gotten clean test runs when using a proper server, but never have on my 2 core / 8GB mem laptop. This is an area where we could do a better job of setting expectations for contributors, but I'm not sure we have good stats about what kind of build hardware is needed for a full build. Hopefully it's less than the H* builds.apache machines. :) -- Sean
[jira] [Created] (HADOOP-12116) Fix unrecommended syntax usages in hadoop/hdfs/yarn script for cygwin in branch-2
Li Lu created HADOOP-12116: -- Summary: Fix unrecommended syntax usages in hadoop/hdfs/yarn script for cygwin in branch-2 Key: HADOOP-12116 URL: https://issues.apache.org/jira/browse/HADOOP-12116 Project: Hadoop Common Issue Type: Bug Reporter: Li Lu Assignee: Li Lu We're using syntax like if $cygwin; then which may be errorounsly evaluated into true if cygwin is unset. We need to fix this in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Where to put some code examples?
On Wed, Jun 24, 2015 at 2:10 AM, Ray Chiang rchi...@cloudera.com wrote: Thanks, dev-support sounds good. The only question I have is that there isn't a pom.xml there now. Is that something we'd want to have there? And should it at least be linked to the main build via some option, like -Pdev? I was working under the assumption that they'd be independent project poms that a dev would have to actively change directories to use. If they're hooked into the main build, I'd say add a new module instead. We already have a few foo-examples modules, so maybe flag it as foo-internal-examples to distinguish from things downstream users should be looking to for guidance. -- Sean
Re: Where to put some code examples?
On Jun 24, 2015, at 10:03 AM, Sean Busbey bus...@cloudera.com wrote: On Wed, Jun 24, 2015 at 2:10 AM, Ray Chiang rchi...@cloudera.com wrote: Thanks, dev-support sounds good. The only question I have is that there isn't a pom.xml there now. Is that something we'd want to have there? And should it at least be linked to the main build via some option, like -Pdev? I was working under the assumption that they'd be independent project poms that a dev would have to actively change directories to use. If they're hooked into the main build, I'd say add a new module instead. We already have a few foo-examples modules, so maybe flag it as foo-internal-examples to distinguish from things downstream users should be looking to for guidance. Agreed: dev-support is probably the wrong place. That's typically where things that help make the build, build tend to go. There are some examples and samples sprinkled throughout the code base. Plus there is always hadoop-tools as a fallback, which is sort of the 'junk drawer' for these sorts of things. IMO, random .jhist generator sounds like a thing that should go into hadoop-tools.
Re: 2.7.1 status
With a bit of effort from a bunch of contributors/committers, we are finally down to zero blocker/critical issues. Unless, the situation changes, I’ll roll an RC in a day or two. This time for real. Thanks +Vinod On Jun 15, 2015, at 2:58 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: We are down to one blocker and a few critical tickets. I’ll try to push out an RC in a day or two. Thanks +Vinod On Jun 1, 2015, at 10:45 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Tx for the move on that JIRA, folks. https://issues.apache.org/jira/issues/?filter=12331550 still shows 4 blockers / 4 criticals. I'm going to start pushing them in/out. Thanks +Vinod On May 27, 2015, at 3:20 PM, Chris Nauroth cnaur...@hortonworks.com wrote: Thanks, Larry. I have marked HADOOP-11934 as a blocker for 2.7.1. I have reviewed and +1'd it. I can commit it after we get feedback from Jenkins. --Chris Nauroth On 5/26/15, 12:41 PM, larry mccay lmc...@apache.org wrote: Hi Vinod - I think that https://issues.apache.org/jira/browse/HADOOP-11934 should also be added to the blocker list. This is a critical bug in our ability to protect the LDAP connection password in LdapGroupsMapper. thanks! --larry On Tue, May 26, 2015 at 3:32 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Tx for reporting this, Elliot. Made it a blocker, not with a deeper understanding of the problem. Can you please chime in with your opinion and perhaps code reviews? Thanks +Vinod On May 26, 2015, at 10:48 AM, Elliott Clark ecl...@apache.org wrote: HADOOP-12001 should probably be added to the blocker list since it's a regression that can keep ldap from working.
Re: Where to put some code examples?
Agreed. Outside of build tools, you should just file tickets so that we can figure out the right place for the right thing. Thanks +Vinod On Jun 24, 2015, at 10:11 AM, Allen Wittenauer a...@altiscale.commailto:a...@altiscale.com wrote: On Jun 24, 2015, at 10:03 AM, Sean Busbey bus...@cloudera.commailto:bus...@cloudera.com wrote: On Wed, Jun 24, 2015 at 2:10 AM, Ray Chiang rchi...@cloudera.commailto:rchi...@cloudera.com wrote: Thanks, dev-support sounds good. The only question I have is that there isn't a pom.xml there now. Is that something we'd want to have there? And should it at least be linked to the main build via some option, like -Pdev? I was working under the assumption that they'd be independent project poms that a dev would have to actively change directories to use. If they're hooked into the main build, I'd say add a new module instead. We already have a few foo-examples modules, so maybe flag it as foo-internal-examples to distinguish from things downstream users should be looking to for guidance. Agreed: dev-support is probably the wrong place. That's typically where things that help make the build, build tend to go. There are some examples and samples sprinkled throughout the code base. Plus there is always hadoop-tools as a fallback, which is sort of the 'junk drawer' for these sorts of things. IMO, random .jhist generator sounds like a thing that should go into hadoop-tools.
[jira] [Created] (HADOOP-12117) Potential NPE from Configuration#loadProperty with allowNullValueProperties set.
zhihai xu created HADOOP-12117: -- Summary: Potential NPE from Configuration#loadProperty with allowNullValueProperties set. Key: HADOOP-12117 URL: https://issues.apache.org/jira/browse/HADOOP-12117 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.7.1 Reporter: zhihai xu Assignee: zhihai xu Potential NPE from Configuration#loadProperty with allowNullValueProperties set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)