Re: Plans of moving towards JDK7 in trunk
On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.comwrote: On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote: For the sake of this discussion we should separate the runtime from the programming APIs. Users are already migrating to the java7 runtime for most of the reasons listed below (support, performance, bugs, etc), and the various distributions cert their Hadoop 2 based distributions on java7. This gives users many of the benefits of java7, without forcing users off java6. Ie Hadoop does not need to switch to the java7 programming APIs to make sure everyone has a supported runtime. +1: you can use Java 7 today; I'm not sure how tested Java 8 is The question here is really about when Hadoop, and the Hadoop ecosystem (since adjacent projects often end up in the same classpath) start using the java7 programming APIs and therefore break compatibility with java6 runtimes. I think our java6 runtime users would consider dropping support for their java runtime in an update of a major release to be an incompatible change (the binaries stop working on their current jvm). do you mean major 2.x - 3.y or minor 2.x - 2.(x+1) here? I mean 2.x -- 2.(x+1). Ie I'm running the 2.4 stable and upgrading to 2.5. That may be worth it if we can articulate sufficient value to offset the cost (they have to upgrade their environment, might make rolling upgrades stop working, etc), but I've not yet heard an argument that articulates the value relative to the cost. Eg upgrading to the java7 APIs allows us to pull in dependencies with new major versions, but only if those dependencies don't break compatibility (which is likely given that our classpaths aren't so isolated), and, realistically, only if the entire Hadoop stack moves to java7 as well (eg we have to recompile HBase to generate v1.7 binaries even if they stick on API v1.6). I'm not aware of a feature, bug etc that really motivates this. I don't see that being needed unless we move up to new java7+ only libraries and HBase needs to track this. The big recompile to work issue is google guava, which is troublesome enough I'd be tempted to say can we drop it entirely An alternate approach is to keep the current stable release series (v2.x) as is, and start using new APIs in trunk (for v3). This will be a major upgrade for Hadoop and therefore an incompatible change like this is to be expected (it would be great if this came with additional changes to better isolate classpaths and dependencies from each other). It allows us to continue to support multiple types of users with different branches, vs forcing all users onto a new version. It of course means that 2.x users will not get the benefits of the new API, but its unclear what those benefits are given theIy can already get the benefits of adopting the newer java runtimes today. I'm (personally) +1 to this, I also think we should plan to do the switch some time this year to not only get the benefits, but discover the costs Agree -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Plans of moving towards JDK7 in trunk
On Thu, Apr 10, 2014 at 6:49 AM, Raymie Stata rst...@altiscale.com wrote: I think the problem to be solved here is to define a point in time when the average Hadoop contributor can start using Java7 dependencies in their code. The use Java7 dependencies in trunk(/branch3) plan, by itself, does not solve this problem. The average Hadoop contributor wants to see their contributions make it into a stable release in a predictable amount of time. Putting code with a Java7 dependency into trunk means the exact opposite: there is no timeline to a stable release. So most contributors will stay away from Java7 dependencies, despite the nominal policy that they're allowed in trunk. (And the few that do use Java7 dependencies are people who do not value releasing code into stable releases, which arguably could lead to a situation that the Java7-dependent code in trunk is, on average, on the buggy side.) I'm not saying the branch2-in-the-future plan is the only way to solve the problem of putting Java7 dependencies on a known time-table, but at least it solves it. Is there another solution? All good reasons for why we should start thinking about a plan for v3. The points above pertain to any features for trunk that break compatibility, not just ones that use new Java APIs. We shouldn't permit incompatible changes to merge to v2 just because we don't yet have a timeline for v3, we should figure out the latter. Also motivates finishing the work to isolate dependencies between Hadoop code, other framework code, and user code. Let's speak less abstractly, are there particular features or new dependencies that you would like to contribute (or see contributed) that require using the Java 1.7 APIs? Breaking compat in v2 or rolling a v3 release are both non-trivial, not something I suspect we'd want to do just because it would be, for example, nicer to have a newer version of Jetty. Thanks, Eli On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com wrote: On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote: For the sake of this discussion we should separate the runtime from the programming APIs. Users are already migrating to the java7 runtime for most of the reasons listed below (support, performance, bugs, etc), and the various distributions cert their Hadoop 2 based distributions on java7. This gives users many of the benefits of java7, without forcing users off java6. Ie Hadoop does not need to switch to the java7 programming APIs to make sure everyone has a supported runtime. +1: you can use Java 7 today; I'm not sure how tested Java 8 is The question here is really about when Hadoop, and the Hadoop ecosystem (since adjacent projects often end up in the same classpath) start using the java7 programming APIs and therefore break compatibility with java6 runtimes. I think our java6 runtime users would consider dropping support for their java runtime in an update of a major release to be an incompatible change (the binaries stop working on their current jvm). do you mean major 2.x - 3.y or minor 2.x - 2.(x+1) here? That may be worth it if we can articulate sufficient value to offset the cost (they have to upgrade their environment, might make rolling upgrades stop working, etc), but I've not yet heard an argument that articulates the value relative to the cost. Eg upgrading to the java7 APIs allows us to pull in dependencies with new major versions, but only if those dependencies don't break compatibility (which is likely given that our classpaths aren't so isolated), and, realistically, only if the entire Hadoop stack moves to java7 as well (eg we have to recompile HBase to generate v1.7 binaries even if they stick on API v1.6). I'm not aware of a feature, bug etc that really motivates this. I don't see that being needed unless we move up to new java7+ only libraries and HBase needs to track this. The big recompile to work issue is google guava, which is troublesome enough I'd be tempted to say can we drop it entirely An alternate approach is to keep the current stable release series (v2.x) as is, and start using new APIs in trunk (for v3). This will be a major upgrade for Hadoop and therefore an incompatible change like this is to be expected (it would be great if this came with additional changes to better isolate classpaths and dependencies from each other). It allows us to continue to support multiple types of users with different branches, vs forcing all users onto a new version. It of course means that 2.x users will not get the benefits of the new API, but its unclear what those benefits are given theIy can already get the benefits of adopting the newer java runtimes today. I'm (personally) +1 to this, I also think we should plan to do the switch some time this year to not only get the benefits, but discover
Re: Plans of moving towards JDK7 in trunk
to branch-2. I guess this all depends on when we see ourselves shipping Hadoop-3. Any ideas on that? On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote: On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi davi.ottenhei...@emc.com wrote: From: Eli Collins [mailto:e...@cloudera.com] Sent: Monday, April 07, 2014 11:54 AM IMO we should not drop support for Java 6 in a minor update of a stable release (v2). I don't think the larger Hadoop user base would find it acceptable that upgrading to a minor update caused their systems to stop working because they didn't upgrade Java. There are people still getting support for Java 6. ... Thanks, Eli Hi Eli, Technically you are correct those with extended support get critical security fixes for 6 until the end of 2016. I am curious whether many of those are in the Hadoop user base. Do you know? My guess is the vast majority are within Oracle's official public end of life, which was over 12 months ago. Even Premier support ended Dec 2013: http://www.oracle.com/technetwork/java/eol-135779.html The end of Java 6 support carries much risk. It has to be considered in terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS score 10.0. http://www.cvedetails.com/cve/CVE-2013-2465/ Since you mentioned caused systems to stop as an example of what would be a concern to Hadoop users, please note the CVE-2013-2465 availability impact: Complete (There is a total shutdown of the affected resource. The attacker can render the resource completely unavailable.) This vulnerability was patched in Java 6 Update 51, but post end of life. Apple pushed out the update specifically because of this vulnerability (http://support.apple.com/kb/HT5717) as did some other vendors privately, but for the majority of people using Java 6 means they have a ticking time bomb. Allowing it to stay should be considered in terms of accepting the whole risk posture. There are some who get extended support, but I suspect many just have a if-it's-not-broke mentality when it comes to production deployments. The current code supports both java6 and java7 and so allows these people to remain compatible, while enabling others to upgrade to the java7 runtime. This seems like the right compromise for a stable release series. Again, absolutely makes sense for trunk (ie v3) to require java7 or greater. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Plans of moving towards JDK7 in trunk
On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi davi.ottenhei...@emc.com wrote: From: Eli Collins [mailto:e...@cloudera.com] Sent: Monday, April 07, 2014 11:54 AM IMO we should not drop support for Java 6 in a minor update of a stable release (v2). I don't think the larger Hadoop user base would find it acceptable that upgrading to a minor update caused their systems to stop working because they didn't upgrade Java. There are people still getting support for Java 6. ... Thanks, Eli Hi Eli, Technically you are correct those with extended support get critical security fixes for 6 until the end of 2016. I am curious whether many of those are in the Hadoop user base. Do you know? My guess is the vast majority are within Oracle's official public end of life, which was over 12 months ago. Even Premier support ended Dec 2013: http://www.oracle.com/technetwork/java/eol-135779.html The end of Java 6 support carries much risk. It has to be considered in terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS score 10.0. http://www.cvedetails.com/cve/CVE-2013-2465/ Since you mentioned caused systems to stop as an example of what would be a concern to Hadoop users, please note the CVE-2013-2465 availability impact: Complete (There is a total shutdown of the affected resource. The attacker can render the resource completely unavailable.) This vulnerability was patched in Java 6 Update 51, but post end of life. Apple pushed out the update specifically because of this vulnerability (http://support.apple.com/kb/HT5717) as did some other vendors privately, but for the majority of people using Java 6 means they have a ticking time bomb. Allowing it to stay should be considered in terms of accepting the whole risk posture. There are some who get extended support, but I suspect many just have a if-it's-not-broke mentality when it comes to production deployments. The current code supports both java6 and java7 and so allows these people to remain compatible, while enabling others to upgrade to the java7 runtime. This seems like the right compromise for a stable release series. Again, absolutely makes sense for trunk (ie v3) to require java7 or greater.
Re: Plans of moving towards JDK7 in trunk
On Sat, Apr 5, 2014 at 12:54 PM, Raymie Stata rst...@altiscale.com wrote: To summarize the thread so far: a) Java7 is already a supported compile- and runtime environment for Hadoop branch2 and trunk b) Java6 must remain a supported compile- and runtime environment for Hadoop branch2 c) (b) implies that branch2 must stick to Java6 APIs I wonder if point (b) should be revised. We could immediately deprecate Java6 as a runtime (and thus compile-time) environment for Hadoop. We could end support for in some published time frame (perhaps 3Q2014). That is, we'd say that all future 2.x release past some date would not be guaranteed to run on Java6. This would set us up for using Java7 APIs into branch2. IMO we should not drop support for Java 6 in a minor update of a stable release (v2). I don't think the larger Hadoop user base would find it acceptable that upgrading to a minor update caused their systems to stop working because they didn't upgrade Java. There are people still getting support for Java 6. For the same reason, the various distributions will not want to drop support in a minor update of their products also, and since distros are using the Apache v2.x update releases as the basis for their updates it would mean they have to stop shipping v2.x updates, which makes it harder to collaborate upstream. Your point with regard to testing and releasing trunk is valid, though we need to address that anyway, outside the context of Java versions. Thanks, Eli
Re: [VOTE] Release Apache Hadoop 2.4.0
+1 for another rc. There have been quite a few issues found (handful marked blocker) and this is only the first release candidate, seems like the point of having multiple release candidates is to iterate with another one that addresses the major issues found with the previous one. On Fri, Apr 4, 2014 at 5:06 PM, Gera Shegalov g...@shegalov.com wrote: I built the release from the rc tag, enabled timeline history service and ran a sleep job on a pseudo-distributed cluster. I encourage another rc, for 2.4.0 (non-binding) 1) Despite the discussion on YARN-1701, timeline AHS still sets yarn.timeline-service.generic-application-history.fs-history-store.uri to a location under ${hadoop.log.dir} that is meant for local file system, but uses it on HDFS by default. 2) Critical patch for WebHdfs/Hftp to fix the filesystem contract HDFS-6143 is not included 3) Several patches that already proved themselves useful for diagnostics in production and have been available for some months are still not included. MAPREDUCE-5044/YARN-1515 is the most obvious example. Our users need to see where the task container JVM got stuck when it was timed out by AM. Thanks, Gera On Fri, Apr 4, 2014 at 3:51 PM, Azuryy azury...@gmail.com wrote: Arun, Do you mean you will cut another RC for 2.4? Sent from my iPhone5s On 2014年4月5日, at 3:50, Arun C. Murthy a...@hortonworks.com wrote: Thanks for helping Tsuyoshi. Pls mark them as Blockers and set the fix-version to 2.4.1. Thanks again. Arun On Apr 3, 2014, at 11:38 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com wrote: Hi, Updated a test result log based on the result of 2.4.0-rc0: https://gist.github.com/oza/9965197 IMO, there are some blockers to be fixed: * MAPREDUCE-5815(TestMRAppMaster failure) * YARN-1872(TestDistributedShell failure) * HDFS: TestSymlinkLocalFSFileSystem failure on Linux (I cannot find JIRA about this failure) Now I'm checking the problem reported by Azuryy. Thanks, - Tsuyoshi On Fri, Apr 4, 2014 at 8:55 AM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com wrote: Hi, Ran tests and confirmed that some tests(TestSymlinkLocalFSFileSystem) fail. The log of the test failure is as follows: https://gist.github.com/oza/9965197 Should we fix or disable the feature? Thanks, - Tsuyoshi On Mon, Mar 31, 2014 at 6:22 PM, Arun C Murthy a...@hortonworks.com wrote: Folks, I've created a release candidate (rc0) for hadoop-2.4.0 that I would like to get released. The RC is available at: http://people.apache.org/~acmurthy/hadoop-2.4.0-rc0 The RC tag in svn is here: https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0-rc0 The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- - Tsuyoshi -- - Tsuyoshi -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Edit permissions for my Hadoop wiki account
Added you both. On Fri, Jan 3, 2014 at 4:58 PM, Arpit Agarwal aagar...@hortonworks.com wrote: Could some kind admin do the same for my account too? My Hadoop wiki username is ArpitAgarwal Thanks! On Fri, Jan 3, 2014 at 4:54 PM, Andrew Wang andrew.w...@cloudera.comwrote: Hi all, Could someone give my wiki account edit permissions? Username is AndrewWang. Thanks, Andrew -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: branch development for HADOOP-9639
+1good idea Thanks for contributing Sangjin. On Mon, Dec 2, 2013 at 11:47 AM, Sangjin Lee sj...@apache.org wrote: We have been having discussions on HADOOP-9639 (shared cache for jars) and the proposed design there for some time now. We are going to start work on this and have it vetted and reviewed by the community. I have just filed some more implementation JIRAs for this feature: YARN-1465, MAPREDUCE-5662, YARN-1466, YARN-1467 Rather than working privately in our corner and sharing a big patch at the end, I'd like to explore the idea of developing on a branch in the public to foster more public feedback. Recently the Hadoop PMC has passed the change to the bylaws to allow for branch committers ( http://mail-archives.apache.org/mod_mbox/hadoop-general/201307.mbox/%3CCACO5Y4y7HZnn3BS-ZyCVfv-UBcMudeQhndr2vqg%3DXqE1oBiQvQ%40mail.gmail.com%3E), and I think it would be a good model for this development. I'd like to propose a branch development and a branch committer status for a couple of us who are going to work on this per bylaw. Could you please let me know what you think? Thanks, Sangjin
Re: Wiki Permissions
I added your username, you should be able to edit now. On Tue, Nov 12, 2013 at 1:00 PM, Billy Watson williamrwat...@gmail.com wrote: Sorry, my username is WilliamWatson. William Watson Software Engineer (904) 705-7056 PCS On Tue, Nov 12, 2013 at 4:00 PM, Billy Watson williamrwat...@gmail.comwrote: Hi, I'd like to get permissions to edit the wiki to both put my company name on the PoweredBy page and to improve the HBase REST docs. William Watson Software Engineer (904) 705-7056 PCS
[jira] [Resolved] (HADOOP-10034) optimize same-filesystem symlinks by doing resolution server-side
[ https://issues.apache.org/jira/browse/HADOOP-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-10034. -- Resolution: Duplicate Dupe of HDFS-932 optimize same-filesystem symlinks by doing resolution server-side - Key: HADOOP-10034 URL: https://issues.apache.org/jira/browse/HADOOP-10034 Project: Hadoop Common Issue Type: Sub-task Components: fs Reporter: Colin Patrick McCabe We should optimize same-filesystem symlinks by doing resolution server-side rather than client side, as discussed on HADOOP-9780. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: Managing docs with hadoop-1 hadoop-2
On Fri, Oct 18, 2013 at 2:10 PM, Arun C Murthy a...@hortonworks.com wrote: Folks, Currently http://hadoop.apache.org/docs/stable/ points to hadoop-1. With hadoop-2 going GA, should we just point that to hadoop-2? Couple of options: # Have stable1/stable2 links: http://hadoop.apache.org/docs/stable1 - hadoop-1.x http://hadoop.apache.org/docs/stable2 - hadoop-2.x +1, would also make: current - stable2(since v2 is the latest) stable - stable1 (for compatibility) Thanks, Eli # Just point stable to hadoop-2 and create something new for hadoop-1: http://hadoop.apache.org/docs/hadoop1 - hadoop-1.x http://hadoop.apache.org/docs/stable - hadoop-2.x We have similar requirements for *current* link too. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (HADOOP-10055) FileSystemShell.apt.vm doc has typo numRepicas
Eli Collins created HADOOP-10055: Summary: FileSystemShell.apt.vm doc has typo numRepicas Key: HADOOP-10055 URL: https://issues.apache.org/jira/browse/HADOOP-10055 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 2.2.0 Reporter: Eli Collins Priority: Trivial HDFS-5139 added numRepicas to FileSystemShell.apt.vm, should be numReplicas. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: streaming documentation in Hadoop 2?
This is MAPREDUCE-4282 On Mon, Oct 14, 2013 at 3:28 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Doc existed in MR1 http://hadoop.apache.org/docs/stable/streaming.html, but it looks like it and a bunch of other stuff (e.g. Rumen and the MapReduce Tutorial) weren't ported over. On Mon, Oct 14, 2013 at 3:20 PM, Eli Collins e...@cloudera.com wrote: It probably just needs doc, I'd go ahead and file a jira for it. The wiki content here could be a good starting point. On Mon, Oct 14, 2013 at 2:56 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi All, I noticed that the hadoop streaming documentation does not exist in the Hadoop 2 source tree, and also cannot be found on the internet. Is this on purpose? I found this wiki page http://wiki.apache.org/hadoop/HadoopStreaming - is that where doc is supposed to go? As this page isn't tied to a specific version, how does it work if new options are added? thanks, -Sandy
Re: symlink support in Hadoop 2 GA
On Wed, Sep 18, 2013 at 5:45 AM, Steve Loughran ste...@hortonworks.comwrote: On 18 September 2013 12:53, Alejandro Abdelnur t...@cloudera.com wrote: On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran ste...@hortonworks.com wrote: I'm reluctant for this as while delaying the release, because we are going to find problems all the way up the stack -which will require a choreographed set of changes. Given the grief of the protbuf update, I don't want to go near that just before the final release. Well, I would use the exact same argument used for protobuf (which only complication was getting protoc 2.5.0 in the jenkins boxes and communicate developers to do the same, other than that we didn't hit any other issue AFAIK) ... protobuf was traumatic at build time, as I recall because it was neither forwards or backwards compatible. Those of us trying to build different branches had to choose which version to have on the path, or set up scripts to do the switching. HBase needed rebuilding, so did other things. And I still have the pain of downloading and installing protoc on all Linux VMs I build up going forward, until apt-get and yum have protoc 2.5 artifacts. This means it was very painful for developer, added a lot of late breaking pain to the developers, but it had one key feature that gave it an edge: it was immediately obvious where you had a problem as things didn't compile or classload without linkage problems. No latent bugs, unless protobuf 2.5 has them internally -for which we have to rely on google's release testing to have found. That is a lot simpler to regression test than adding any new feature to HDFS and seeing what breaks -as that is something that only surfaces out in the field. Which is why I think it's too late in the 2.1 release timetable to add symlinks. We've had a 2.1-beta out there, we've got feedback. Fix those problems that are show stoppers, but don't add more stuff. Which is precisely why I have not been pushing in any of my recent changes. I may seem ruthless arguing against symlinks -but I'm not being inconsistent with my own commit history. The only two things I've put in branch-2.1 since beta-1 were a separate log for the Configuration deprecation warnings and a patch to the POM for a java7 build on OSX: and they weren't even my patches. -Steve (One of these days I should volunteer to be the release manager and it'll be obvious that Arun is being quite amenable to all the other developers) IMO, it makes more sense to do this change during the beta rather than when GA. That gives us more flexibility to iron out things if necessary. I'm arguing this change can go into the beta of the successor to 2.1 -not GA. What does this change refer to? Symlinks are already in 2.1, and the existing semantics create problems for programs (eg see the pig example in HADOOP-9912) that we need to resolve. I don't think do nothing is an option for 2.2. GA. Thanks, Eli -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: symlink support in Hadoop 2 GA
(Looping in Arun since this impacts 2.x releases) I updated the versions on HADOOP-8040 and sub-tasks to reflect where the changes have landed. All of these changes (modulo HADOOP-9417) were merged to branch-2.1 and are in the 2.1.0 release. While symlinks are in 2.1.0 I don't think we can really claim they're ready until issues like HADOOP-9912 are resolved, and they are supported in the shell, distcp and WebHDFS/HttpFS/Hftp (these are not esoteric!). Someone can create a symlink with FileSystem causing someone else's distcp job to fail. Unlikely given they're not exposed outside the Java API but still not great. Ideally this work would have been done on a feature branch and then merged when complete, but that's water under the bridge. I see the following options: 1. Fixup the current symlink support so that symlinks are ready for 2.2 (GA), or at least the public APIs. This means the APIs will be in GA from the get go so while the functionality might be fully baked we don't have to worry about incompatible changes like FileStatus#isDir() changing behavior in 2.3 or a later update. The downside is this will take at least a couple weeks (to resolve HADOOP-9912 and potentially implement the remaining pieces) and so may impact the 2.2 release timing. This option means 2.2 won't remove the new APIs introduced in 2.1. We'd want to spin a 2.1.2 beta with the new API changes so we don't introduce new APIs in the beta to GA transition. 2. Revert symlinks from branch-2.1-beta and branch-2. Finish up the work in trunk (or a feature branch) and merge for a subsequent 2.x update. While this helps get us to GA faster it would be preferable to get an API change like this in for 2.2 GA since they may be disruptive to introduce in an update (eg see example in #1). And of course our users would like symlinks functionality in the GA release. This option would mean 2.2 is incompatible with 2.1 because it's dropping the new APIs, not ideal for a beta to GA transition. 3. Revert and punt symlinks to 3.x. IMO should be the last resort. If we have sufficient time I think option #1 would be best. What do others think? Thanks, Eli On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, I wanted to broadcast plans for putting the FileSystem symlinks work (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think it's pretty important we get it in since it's not a compatible change; if it misses the GA train, we're not going to have symlinks until the next major release. However, we're still dealing with ongoing issues revealed via testing. There's user-code out there that only handles files and directories and will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912 for a nice example where globStatus returning symlinks broke Pig; some of us had a conference call to talk it through, and one definite conclusion was that this wasn't solvable in a generally compatible manner. There are also still some gaps in symlink support right now. For example, the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink resolution, and tooling like the FsShell and Distcp still need to be updated as well. So, there's definitely work to be done, but there are a lot of users interested in the feature, and symlinks really should be in GA. Would appreciate any thoughts/input on the matter. Thanks, Andrew
Re: What's the status of IPv6 support?
Hey Andrew, I'm not aware of anyone working on IPv6 support. The following looks up to date and may be helpful: http://wiki.apache.org/hadoop/HadoopIPv6 Thanks, Eli On Fri, Aug 16, 2013 at 8:27 AM, Andrew Pennebaker apenneba...@42six.com wrote: When will Hadoop get better IPv6 support? It would be nice if Ubuntu users didn't have to disable IPv6 in order to get Hadoop working.
Re: What's the status of IPv6 support?
On Friday, August 16, 2013, Steve Loughran wrote: On 16 August 2013 10:05, Andrew Pennebaker apenneba...@42six.comjavascript:; wrote: Thanks for the link! I understand a lack of support *for* IPv6, what I don't understand is why IPv6 must be disabled in order for Hadoop to work. On systems with both IPv4 and IPv6, I thought IPv4 apps could ignore IPv6 as if it weren't available. I suppose that's not quite right. historically the JVM IPv6 support on Linux has been pretty troublesome -I don't know if it's better now, because we all avoid it. I'm not sure you have to disable it as long as all java processes get started with -Djava.net.preferIPv4Stack=true as a JVM arg That's correct, I've run Hadoop clusters with ipv6 enabled using these options (they were checked in by default awhile back). If you don't say that then the JVM keeps trying to create sockets at the IPv6 level and then fallback to IPv4 if it can't connect, which adds overhead to all connections. -steve On Fri, Aug 16, 2013 at 12:44 PM, Eli Collins e...@cloudera.comjavascript:; wrote: Hey Andrew, I'm not aware of anyone working on IPv6 support. The following looks up to date and may be helpful: http://wiki.apache.org/hadoop/HadoopIPv6 Thanks, Eli On Fri, Aug 16, 2013 at 8:27 AM, Andrew Pennebaker apenneba...@42six.com javascript:; wrote: When will Hadoop get better IPv6 support? It would be nice if Ubuntu users didn't have to disable IPv6 in order to get Hadoop working. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Java 7 and Hadoop
You should be able to login to builds.apache.org using your Apache credentials and create a job. I'd copy the Hadoop trunk job and just update the JDK dropdown to JDK 1.7 (latest). You can ping me today if that doesn't work. On Fri, Aug 2, 2013 at 10:55 AM, Sandy Ryza sandy.r...@cloudera.com wrote: How do we go about setting up builds on jdk7? Couldn't figure out how to do it on Jenkins, so I'm assuming it either requires PMC or Infra? -Sandy On Thu, Jul 25, 2013 at 10:43 AM, Santhosh M S santhosh_mut...@yahoo.comwrote: Resurrecting an old thread - now that JDK 6 is EOL should we switch over to JDK 7 across the board? Thoughts? Thanks, Santhosh From: Eli Collins e...@cloudera.com To: common-dev@hadoop.apache.org Sent: Tuesday, July 31, 2012 6:19 PM Subject: Re: Java 7 and Hadoop We should officially support it in Hadoop (one piece of BIGTOP-458 which is for supporting it across the whole stack). Now that HADOOP-8370 (fixes native compilation) is in the full tarball should work. A good next step would be updating JAVA_HOME on one of the Hadoop jenkins jobs to use jdk7. On Tue, Jul 31, 2012 at 9:17 AM, Thomas Graves tgra...@yahoo-inc.com wrote: I've seen more and more people using java 7. We are also planning to move to java 7 due to the eol of java 6 that Scott referenced. What are folks thoughts on making it officially supported by Hadoop? Is there a process for this or is it simply updating the wiki Eli mentioned after sufficient testing? Thanks, Tom On 4/26/12 4:25 PM, Eli Collins e...@cloudera.com wrote: Hey Scott, Nice. Please update this page with your experience when you get a chance: http://wiki.apache.org/hadoop/HadoopJavaVersions Thanks, Eli On Thu, Apr 26, 2012 at 2:03 PM, Scott Carey sc...@richrelevance.com wrote: Java 7 update 4 has been released. It is even available for MacOS X from Oracle: http://www.oracle.com/technetwork/java/javase/downloads/jdk-7u4-downloads-159 1156.html Java 6 will reach end of life in about 6 months. After that point, there will be no more public updates from Oracle for Java 6, even security updates. https://blogs.oracle.com/henrik/entry/updated_java_6_eol_date You can of course pay them for updates or build your own OpenJDK. The entire Hadoop ecosystem needs to test against Java 7 JDKs this year. I will be testing some small clusters of ours with JDK 7 in about a month, and my internal projects will start using Java 7 features shortly after. See the JDK roadmap: http://blogs.oracle.com/javaone/resource/java_keynote/slide_15_full_size.gif https://blogs.oracle.com/java/entry/moving_java_forward_java_strategy
Re: Hadoop 2.0 - org.apache.hadoop.fs.Hdfs vs. DistributedFileSystem?
Hey Steve, That's correct, see HADOOP-6223 for the history. However, per Andrew I don't think it's realistic to expect people to migrate off FileSystem for a while (I filed HADOOP-6446 well over three years ago). The unfortunate consequence of the earlier decision to have parallel interfaces rather than transition one over time means people effectively need to end up implementing multiple backends - one that gets used by clients of FileSystem, and one for clients of FileContext. Implementing in only one place significantly limits adoption of the feature or file system because they can't be effectively adopted in practice unless they're available to old and new clients (for example, this is why symlinks are getting backported to FileSystem from FileContext). Thanks, Eli On Tue, Jun 18, 2013 at 11:15 AM, Stephen Watt sw...@redhat.com wrote: Hi Folks My understanding is that from Hadoop 2.0 onwards the AbstractFileSystem is now the strategic class to extend for writing Hadoop FileSystem plugins. This is a departure from previous versions where one would extend the FileSystem class. This seems to be reinforced by the hadoop-default.xml for Hadoop 2.0 in the Apache Wiki (http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/core-default.xml) which shows fs.AbstractFileSystem.hdfs.impl being set to org.apache.hadoop.fs.Hdfs Is my assertion correct? Do we have community consensus around this? i.e. Beyond the apache distro, are the commercial distros (Intel, Hortonworks, Cloudera, WanDisco, EMC Pivotal, etc.) using org.apache.hadoop.fs.Hdfs as their filesystem plugin for HDFS? What does one lose by using the DistributedFileSystem class instead of the Hdfs class? Regards Steve Watt - Original Message - From: Andrew Wang andrew.w...@cloudera.com To: common-dev@hadoop.apache.org Cc: Milind Bhandarkar mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, Steve Loughran ste...@hortonworks.com, Kun Ling erlv5...@gmail.com, Roman Shaposhnik shaposh...@gmail.com, Andrew Purtell apurt...@apache.org, cdoug...@apache.org, jayh...@cs.ucsc.edu, Sanjay Radia san...@hortonworks.com Sent: Friday, June 14, 2013 1:32:38 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop Hey Steve, I agree that it's confusing. FileSystem and FileContext are essentially two parallel sets of interfaces for accessing filesystems in Hadoop. FileContext splits the interface and shared code with AbstractFileSystem, while FileSystem is all-in-one. If you're looking for the AFS equivalents to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs. Realistically, FileSystem isn't going to be deprecated and removed any time soon. There are lots of 3rd-party FileSystem implementations, and most apps today use FileSystem (including many HDFS internals, like trash and the shell). When I read the wiki page, I figured that the mention of AFS was essentially a typo, since everyone's been steaming ahead with FileSystem. Standardizing FileSystem makes total sense to me, I just wanted to confirm that plan. Best, Andrew On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt sw...@redhat.com wrote: This is a good point Andrew. The hangout was actually the first time I'd heard about the AbstractFileSystem class. I've been doing some further analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0 implementation of DistributedFileSystem and LocalFileSystem class they extend the FileSystem class and not AbstractFileSystem. I would imagine if the plan for Hadoop 2.0 is to build FileSystem implementations using the AbstractFileSystem, then those two would use it, so I'm a bit confused. Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you clarify this for us? Regards Steve Watt - Original Message - From: Andrew Wang andrew.w...@cloudera.com To: common-dev@hadoop.apache.org Cc: mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, shaposh...@gmail.com, apurt...@apache.org, cdoug...@apache.org, jayh...@cs.ucsc.edu, san...@hortonworks.com Sent: Monday, June 10, 2013 5:14:16 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop Thanks for the summary Steve, very useful. I'm wondering a bit about the point on testing AbstractFileSystem rather than FileSystem. While these are both wrappers for DFSClient, they're pretty different in terms of the APIs they expose. Furthermore, AFS is not actually a client-facing API; clients interact with an AFS through FileContext. I ask because I did some work trying to unify the symlink tests for both FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things like the default mkdir semantics are different; you can see some of the contortions in HADOOP-9370. I ultimately ended up just adhering to
Re: Cannot update PoweredBy wiki page
Hey Adam, If you create a user ID for the Hadoop wiki and send it to me I'll update this page which will give you perms. http://wiki.apache.org/hadoop/ContributorsGroup On Thu, May 30, 2013 at 3:26 PM, Adam Kawa kawa.a...@gmail.com wrote: Could someone please add me? I promise not to break anything ;) 2013/5/31 Konstantin Boudnik c...@apache.org Someone with proper karma needs to add you into ACL for this page. On Thu, May 30, 2013 at 11:14PM, Adam Kawa wrote: Hi, When uploading new content (and information about my company), I got the exception Sorry, can not save page because rubbelloselotto.de is not allowed in this wiki. How could I solve it? Kind regards, Adam
Re: [PROPOSAL] change in bylaws to remove Release Plan vote
+1 thanks Matt. On Tue, May 21, 2013 at 2:10 PM, Matt Foley ma...@apache.org wrote: Hi all, This has been a side topic in several email threads recently. Currently we have an ambiguity. We have a tradition in the dev community that any committer can create a branch, and propose release candidates from it. Yet the Hadoop bylaws say that releases have to be planned in advance, the plan needs to be voted on, and presumably can be denied. Apache policies (primarily here http://www.apache.org/dev/release.html and here http://www.apache.org/foundation/voting.html, with non-normative commentary here http://incubator.apache.org/guides/releasemanagement.html#best-practice) are very clear on how Releases have to be approved, and our bylaws are consistent with those policies. But Apache policies don't say anything I've found about Release Plans, nor about voting on Release Plans. I propose the following change, to remove Release Plan votes, and give a simple definition of Release Manager role. I'm opening discussion with this proposal, and will put it to a vote if we seem to be getting consensus. Here's the changes I suggest in the Bylawshttp://hadoop.apache.org/bylaws.html document: === 1. In the Decision Making : Actions section of the Bylaws, the following text is removed: ** Release Plan* Defines the timetable and actions for a release. The plan also nominates a Release Manager. Lazy majority of active committers 2. In the Roles and Responsibilities section of the Bylaws, an additional role is defined: ** Release Manager* A Release Manager (RM) is a committer who volunteers to produce a Release Candidate according to HowToReleasehttps://wiki.apache.org/hadoop/HowToRelease. The RM shall publish a Release Plan on the *common-dev@* list stating the branch from which they intend to make a Release Candidate, at least one week before they do so. The RM is responsible for building consensus around the content of the Release Candidate, in order to achieve a successful Product Release vote. === Please share your views. Best regards, --Matt (long-time release manager)
Re: [VOTE] Plan to create release candidate for 0.23.8
+1 On Friday, May 17, 2013, Thomas Graves wrote: Hello all, We've had a few critical issues come up in 0.23.7 that I think warrants a 0.23.8 release. The main one is MAPREDUCE-5211. There are a couple of other issues that I want finished up and get in before we spin it. Those include HDFS-3875, HDFS-4805, and HDFS-4835. I think those are on track to finish up early next week. So I hope to spin 0.23.8 soon after this vote completes. Please vote '+1' to approve this plan. Voting will close on Friday May 24th at 2:00pm PDT. Thanks, Tom Graves
Re: [VOTE] - Release 2.0.5-beta
On Wed, May 15, 2013 at 1:29 PM, Matt Foley mfo...@hortonworks.com wrote: Arun, not sure whether your Yes to all already covered this, but I'd like to throw in support for the compatibility guidelines being a blocker. +1 to that. Definitely an overriding concern for me. +1 Likewise. Would be great to get more eyeballs on Karthik's patch on HADOOP-9517 if people haven't review it already.
Re: Use hadoop.relaxed.worker.version.check to allow versions in the same major version?
Hey Karthik, We already support this for HDFS, see HDFS-2983 (Relax the build version check to permit rolling upgrades within a release). We should do so for Yarn as well, MAPREDUCE-4150 tracks this. I don't think Ahmed is working on it so would be great for you or someone else to take it. Thanks, Eli On Fri, Apr 26, 2013 at 10:21 AM, Karthik Kambatla ka...@cloudera.com wrote: Hi devs, Given that we have API compatibility within a major release, I was wondering if it would make sense to augment the worker version check to allow workers from a different minor/point release in the same major release? The motivation for this is rolling upgrade within a major release. We could use the same property hadoop.relaxed.worker.version.check for this or add another property. Thoughts? Thanks Karthik
Re: Heads up - 2.0.5-beta
On Fri, Apr 26, 2013 at 11:15 AM, Arun C Murthy a...@hortonworks.com wrote: On Apr 25, 2013, at 7:31 PM, Roman Shaposhnik wrote: On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy a...@hortonworks.com wrote: With that in mind, I really want to make a serious push to lock down APIs and wire-protocols for hadoop-2.0.5-beta. Thus, we can confidently support hadoop-2.x in a compatible manner in the future. So, it's fine to add new features, but please ensure that all APIs are frozen for hadoop-2.0.5-beta Arun, since it sounds like you have a pretty definite idea in mind for what you want 'beta' label to actually mean, could you, please, share the exact criteria? Sorry, I'm not sure if this is exactly what you are looking for but, as I mentioned above, the primary aim would be make the final set of required API/write-protocol changes so that we can call it a 'beta' i.e. once 2.0.5-beta ships users downstream projects can be confident about forward compatibility in hadoop-2.x line. Obviously, we might discover a blocker bug post 2.0.5 which *might* necessitate an unfortunate change - but that should be an outstanding exception. Arun, Suresh, Mind reviewing the following page Karthik put together on compatibility? http://wiki.apache.org/hadoop/Compatibility I think we should do something similar to what Sanjay proposed in HADOOP-5071 for Hadoop v2. If we get on the same page on compatibility terms/APIs then we can quickly draft the policy, at least for the things we've already got consensus on. I think our new developers, users, downstream projects, and partners would really appreciate us making this clear. If people like the content we can move it to the Hadoop website and maintain it in svn like the bylaws. The reason I think we need to do so is because there's been confusion about what types of compatibility we promise and some open questions which I'm not sure everyone is clear on. Examples: - Are we going to preserve Hadoop v3 clients against v2 servers now that we have protobuf support? (I think so..) - Can we break rolling upgrade of daemons in updates post GA? (I don't think so..) - Do we disallow HDFS metadata changes that require an HDFS upgrade in an update? (I think so..) - Can we remove methods from v2 and v2 updates that were deprecated in v0.20-22? (Unclear) - Will we preserve binary compatibility for MR2 going forward? (I think so..) - Does the ability to support multiple versions of MR simultaneously via MR2 change the MR API compatibility story? (I don't think so..) - Are the RM protocols sufficiently stable to disallow incompatible changes potentially required by non-MR projects? (Unclear, most large Yarn deployments I'm aware of are running 0.23, not v2 alphas) I'm also not sure there's currently consensus on what an incompatible change is. For example, I think HADOOP-9151 is incompatible because it broke client/server wire compatibility with previous releases and any change that breaks wire compatibility is incompatible. Suresh felt it was not an incompatible change because it did not affect API compatibility (ie PB is not considered part of the API) and the change occurred while v2 is in alpha. Not sure we need to go through the whole exercise of what's allowed in an alpha and beta (water under the bridge, hopefully), but I do think we should clearly define an incompatible change. It's fine that v2 has been a bit wild wild west in the alpha development stage but I think we need to get a little more rigorous. Thanks, Eli
Re: Heads up - 2.0.5-beta
On Fri, Apr 26, 2013 at 2:42 PM, Suresh Srinivas sur...@hortonworks.com wrote: Eli, I will post a more detailed reply soon. But one small correction: I'm also not sure there's currently consensus on what an incompatible change is. For example, I think HADOOP-9151 is incompatible because it broke client/server wire compatibility with previous releases and any change that breaks wire compatibility is incompatible. Suresh felt it was not an incompatible change because it did not affect API compatibility (ie PB is not considered part of the API) and the change occurred while v2 is in alpha. This is not correct. I did not say it was not an incompatible change. It was indeed an incompatible wire protocol change. My argument was, the phase of development we were in, we could not mark wire protocol as stable and not make any incompatible change. But once 2.0.5-beta is out, as had discussed earlier, we should not make further incompatible changes to wire protocol. Sorry for the confusion, I misinterpreted your comments on the jira (specifically, This is an incompatible change: I disagree. and see my argument that about why this is not incompatible.) to indicate that you thought it was not incompatible. -- http://hortonworks.com/download/
Fwd: Compatibility in Apache Hadoop
On Mon, Apr 22, 2013 at 5:42 PM, Steve Loughran ste...@hortonworks.com wrote: On 22 April 2013 14:00, Karthik Kambatla ka...@cloudera.com wrote: Hadoop devs, This doc does not intend to propose new policies. The idea is to have one document that outlines the various compatibility concerns (lots of areas beyond API compatibility), captures the respective policies that exist, and if we want to define policies for the items where it’s not clear we have something to iterate on. The first draft just lists the types of compatibility. In the next step, we can add existing policies and subsequently work towards policies for others. I don't see -yet- a definition of compatible at the API signature level vs semantics level. The @ interface attributes say these methods are internal/external/stable/unstable (there's also @VisibleForTesting,that comes out of guava (yes?). There's a separate issue that says we make some guarantee that the behaviour of a interface remains consistent over versions, which is hard to do without some rigorous definition of what the expected behaviour of an implementation should be. Good point, Steve. I've assumed the semantics of the API had to respect the attribute (eg changing the semantics of FileSystem#close would be an incompatible change, since this is a public/stable API, even if the new semantics are arguably better). But you're right, unless we've actually defined what the semantics of the APIs are it's hard to say if we've materially changed them. How about adding a new section on the page and calling that out explicitly? In practice I think we'll have to take semantics case by case, clearly define the semantics we care about better in the javadocs (for the major end user-facing classes at least, calling out both intended behavior and behavior that's meant to be undefined) and using individual judgement elsewhere. For example, HDFS-4156 changed DataInputStream#seek to throw an IOE if you seek to a negative offset, instead of succeeding then resulting in an NPE on the next access. That's an incompatible change in terms of semantics, but not semantics intended by the author, or likely semantics programs depend on. However if a change made FileSystem#close three times slower, this perhaps a smaller semantic change (eg doesn't change what exceptions get thrown) but probably much less tolerable for end users. In any case, even if we get an 80% solution to the semantics issue we'll probably be in good shape for v2 GA if we can sort out the remaining topics. See any other topics missing? Once the overall outline is in shape it make sense to annotate the page with the current policy (if there's already consensus on one), and identifying areas where we need to come up with a policy or are leaving TBD. Currently this is a source of confusion for new developers, some downstream projects and users. Thanks, Eli
Re: [Vote] Merge branch-trunk-win to trunk
Bobby raises some good questions. A related one, since most current developers won't add Windows support for new features that are platform specific is it assumed that Windows development will either lag or will people actively work on keeping Windows up with the latest? And vice versa in case Windows support is implemented first. Is there a jira for resolving the outstanding TODOs in the code base (similar to HDFS-2148)? Looks like this merge doesn't introduce many which is great (just did a quick diff and grep). Thanks, Eli On Wed, Feb 27, 2013 at 8:17 AM, Robert Evans ev...@yahoo-inc.com wrote: After this is merged in is Windows still going to be a second class citizen but happens to work for more than just development or is it a fully supported platform where if something breaks it can block a release? How do we as a community intend to keep Windows support from breaking? We don't have any Jenkins slaves to be able to run nightly tests to validate everything still compiles/runs. This is not a blocker for me because we often rely on individuals and groups to test Hadoop, but I do think we need to have this discussion before we put it in. --Bobby On 2/26/13 4:55 PM, Suresh Srinivas sur...@hortonworks.com wrote: I had posted heads up about merging branch-trunk-win to trunk on Feb 8th. I am happy to announce that we are ready for the merge. Here is a brief recap on the highlights of the work done: - Command-line scripts for the Hadoop surface area - Mapping the HDFS permissions model to Windows - Abstracted and reconciled mismatches around differences in Path semantics in Java and Windows - Native Task Controller for Windows - Implementation of a Block Placement Policy to support cloud environments, more specifically Azure. - Implementation of Hadoop native libraries for Windows (compression codecs, native I/O) - Several reliability issues, including race-conditions, intermittent test failures, resource leaks. - Several new unit test cases written for the above changes Please find the details of the work in CHANGES.branch-trunk-win.txt - Common changeshttp://bit.ly/Xe7Ynv, HDFS changeshttp://bit.ly/13QOSo9, and YARN and MapReduce changes http://bit.ly/128zzMt. This is the work ported from branch-1-win to a branch based on trunk. For details of the testing done, please see the thread - http://bit.ly/WpavJ4. Merge patch for this is available on HADOOP-8562 https://issues.apache.org/jira/browse/HADOOP-8562. This was a large undertaking that involved developing code, testing the entire Hadoop stack, including scale tests. This is made possible only with the contribution from many many folks in the community. Following people contributed to this work: Ivan Mitic, Chuan Liu, Ramya Sunil, Bikas Saha, Kanna Karanam, John Gordon, Brandon Li, Chris Nauroth, David Lao, Sumadhur Reddy Bolli, Arpit Agarwal, Ahmed El Baz, Mike Liddell, Jing Zhao, Thejas Nair, Steve Maine, Ganeshan Iyer, Raja Aluri, Giridharan Kesavan, Ramya Bharathi Nimmagadda, Daryn Sharp, Arun Murthy, Tsz-Wo Nicholas Sze, Suresh Srinivas and Sanjay Radia. There are many others who contributed as well providing feedback and comments on numerous jiras. The vote will run for seven days and will end on March 5, 6:00PM PST. Regards, Suresh On Thu, Feb 7, 2013 at 6:41 PM, Mahadevan Venkatraman mah...@microsoft.comwrote: It is super exciting to look at the prospect of these changes being merged to trunk. Having Windows as one of the supported Hadoop platforms is a fantastic opportunity both for the Hadoop project and Microsoft customers. This work began around a year back when a few of us started with a basic port of Hadoop on Windows. Ever since, the Hadoop team in Microsoft have made significant progress in the following areas: (PS: Some of these items are already included in Suresh's email, but including again for completeness) - Command-line scripts for the Hadoop surface area - Mapping the HDFS permissions model to Windows - Abstracted and reconciled mismatches around differences in Path semantics in Java and Windows - Native Task Controller for Windows - Implementation of a Block Placement Policy to support cloud environments, more specifically Azure. - Implementation of Hadoop native libraries for Windows (compression codecs, native I/O) - Several reliability issues, including race-conditions, intermittent test failures, resource leaks. - Several new unit test cases written for the above changes In the process, we have closely engaged with the Apache open source community and have got great support and assistance from the community in terms of contributing fixes, code review comments and commits. In addition, the Hadoop team at Microsoft has also made good progress in other projects including Hive, Pig, Sqoop, Oozie, HCat and HBase. Many of these changes have already been committed to the respective trunks with help from various committers and contributors. It is great to see the
Re: Heads up - merge branch-trunk-win to trunk
Thanks for the update Suresh. Has any testing been done on the branch on Linux aside from running the unit tests? Thanks, Eli On Thu, Feb 7, 2013 at 5:42 PM, Suresh Srinivas sur...@hortonworks.comwrote: The support for Hadoop on Windows was proposed in HADOOP-8079https://issues.apache.org/jira/browse/HADOOP-8079 almost a year ago. The goal was to make Hadoop natively integrated, full-featured, and performance and scalability tuned on Windows Server or Windows Azure. We are happy to announce that a lot of progress has been made in this regard. Initial work started in a feature branch, branch-1-win, based on branch-1. The details related to the work done in the branch can be seen in CHANGES.txt http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.branch-1-win.txt?view=markup . This work has been ported to a branch, branch-trunk-win, based on trunk. Merge patch for this is available on HADOOP-8562https://issues.apache.org/jira/browse/HADOOP-8562 . Highlights of the work done so far: 1. Necessary changes in Hadoop to run natively on Windows. These changes handle differences in platforms related to path names, process/task management etc. 2. Addition of winutils tools for managing file permissions and ownership, user group mapping, hardlinks, symbolic links, chmod, disk utilization, and process/task management. 3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh, start and stop scripts. 4. Addition of block placement policy implemnation to support cloud enviroment, more specifically Azure. We are very close to wrapping up the work in branch-trunk-win and getting ready for a merge. Currently the merge patch is passing close to 100% of unit tests on Linux. Soon I will call for a vote to merge this branch into trunk. Next steps: 1. Call for vote to merge branch-trunk-win to trunk, when the work completes and precommit build is clean. 2. Start a discussion on adding Jenkins precommit builds on windows and how to integrate that with the existing commit process. Let me know if you have any questions. Regards, Suresh
Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
-1, 0, -1 IIUC the only platform we plan to add support for that we can't easily support today (w/o an emulation layer like cygwin) is Windows, and it seems like making the bash scripts simpler and having parallel bat files is IMO a better approach. On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley ma...@apache.org wrote: For discussion, please see previous thread [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack. This vote consists of three separate items: 1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency. Please vote +1, 0, -1. 2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1. Please vote +1, 0, -1. 3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency. Please vote +1, 0, -1. Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today. Vote closes at 12:30pm PST on Saturday 1 December. - Personally, my vote is +1, +1, +1. I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks. Best regards, --Matt
[jira] [Resolved] (HADOOP-8968) Add a flag to completely disable the worker version check
[ https://issues.apache.org/jira/browse/HADOOP-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-8968. - Resolution: Fixed Hadoop Flags: Reviewed I've committed this. Thanks Tucu. Add a flag to completely disable the worker version check - Key: HADOOP-8968 URL: https://issues.apache.org/jira/browse/HADOOP-8968 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 1.2.0 Attachments: HADOOP-8968.patch, HADOOP-8968.patch, HADOOP-8968.patch, HADOOP-8968.patch, HADOOP-8968.patch The current logic in the TaskTracker and the DataNode to allow a relax version check with the JobTracker and NameNode works only if the versions of Hadoop are exactly the same. We should add a switch to disable version checking completely, to enable rolling upgrades between compatible versions (typically patch versions). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[DISCUSS] remove packaging
Hey guys, Heads up: I filed HADOOP-8925 to remove the packaging from trunk and branch-2. The packages are not currently being built, were never updated for MR2/YARN, I'm not aware of anyone planning to do this work or maintain them, etc. No sense in letting them continue to bit rot in the code base. Thanks, Eli
Re: [DISCUSS] remove packaging
Hey Bobby, That's correct, I mean the packages directories in common, hdfs, and MR top-level directories, which contain the debs and RPMs. I'm not opposed to someone re-working/contributing new code as long as they're maintained. Don't think there's any sense in keeping the branch-1 code in trunk/branch-2 since it doesn't support the same daemons, isn't maintained etc. Thanks, Eli On Mon, Oct 15, 2012 at 10:52 AM, Robert Evans ev...@yahoo-inc.com wrote: Eli, By packaging I assume that you mean the RPM/Deb packages and not the tar.gz. If that is the case I have no problem with them being removed because as you said in the JIRA BigTop is already providing a working alternative. If someone else wants to step up to maintain them I also don't have a problem with them staying, so long as they become a part of HADOOP-8914 (Automating the release build). --Bobby -Original Message- From: Eli Collins [mailto:e...@cloudera.com] Sent: Monday, October 15, 2012 10:33 AM To: common-dev@hadoop.apache.org Subject: [DISCUSS] remove packaging Hey guys, Heads up: I filed HADOOP-8925 to remove the packaging from trunk and branch-2. The packages are not currently being built, were never updated for MR2/YARN, I'm not aware of anyone planning to do this work or maintain them, etc. No sense in letting them continue to bit rot in the code base. Thanks, Eli
[jira] [Created] (HADOOP-8931) Add Java version to startup message
Eli Collins created HADOOP-8931: --- Summary: Add Java version to startup message Key: HADOOP-8931 URL: https://issues.apache.org/jira/browse/HADOOP-8931 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Trivial I often look at logs and have to track down the java version they were run with, it would be useful if we logged this as part of the startup message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8925) Remove packaging
Eli Collins created HADOOP-8925: --- Summary: Remove packaging Key: HADOOP-8925 URL: https://issues.apache.org/jira/browse/HADOOP-8925 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Per discussion on HADOOP-8809, now that Bigtop is TLP and supports Hadoop v2 let's remove the Hadoop packaging from trunk and branch-2. We should remove it anyway since it no longer part of the build post mavenization, was not updated post MR1 (there's no MR2/YARN packaging) and is not maintained. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8809) RPMs should skip useradds if the users already exist
[ https://issues.apache.org/jira/browse/HADOOP-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-8809. - Resolution: Won't Fix Filed HADOOP-8925. RPMs should skip useradds if the users already exist Key: HADOOP-8809 URL: https://issues.apache.org/jira/browse/HADOOP-8809 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 1.0.3 Reporter: Steve Loughran Priority: Minor The hadoop.spec preinstall script creates users -but it does this even if they already exist. This may causes problems if the installation has already got those users with different uids. A check with {{id}} can avoid this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8914) Automate release builds
Eli Collins created HADOOP-8914: --- Summary: Automate release builds Key: HADOOP-8914 URL: https://issues.apache.org/jira/browse/HADOOP-8914 Project: Hadoop Common Issue Type: Task Reporter: Eli Collins Hadoop releases are currently created manually by the RM (following http://wiki.apache.org/hadoop/HowToRelease), which means various aspects of the build are ad hoc, eg what tool chain was used to compile the java and native code varies from release to release. Other steps can be inconsistent since they're done manually eg recently the checksums for an RC were incorrect. Let's use the jenkins toolchain and create a job that automates creating release builds so that the only manual thing about releasing is publishing to mvn central. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Hadoop-1.0.4-rc0
On Tue, Oct 9, 2012 at 1:02 PM, Matt Foley mfo...@hortonworks.com wrote: Hi Eli, Thanks for the suggestion. Looks like this has gotten fleshed out a little more since I started doing releases. I've had my key posted at MIT since the beginning. I've now also uploaded it to the PGP Global Directory, and uploaded the key fingerprint to my profile at id.apache.org. However, when I tried to commit it to KEYShttp://svn.apache.org/repos/asf/hadoop/common/dist/KEYS, I got error svn: access to '/repos/asf/!svn/act/69e4489f-fcdc-45b3-9a14-637b3a078b13' forbidden. Also, at http://svn.apache.org/repos/asf/hadoop/common/dist/readme.txt it says: To generate the KEYS file, use: % wget https://people.apache.org/keys/group/hadoop.asc KEYS which would seem to argue against simply committing changes to KEYS. Yet the file at https://people.apache.org/keys/group/hadoop.asc is considerably behind the file at http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS Any idea what the correct resolution of this is? Or should I ping infra@for the correct instructions? Bobby, you just updated the KEYS file right? Did you run into any of these issues? Thanks, Eli
Re: Need to add fs shim to use QFS
Hey Thilee, Thanks for contributing. We don't process pull request on the git mirrors, please upload a patch against trunk and branch-1 if you'd like this included in Hadoop 1.x and 2.x releases. More info here: http://wiki.apache.org/hadoop/HowToContribute Thanks, Eli On Fri, Oct 5, 2012 at 10:27 AM, Thilee Subramaniam thi...@quantcast.com wrote: We at Quantcast have released QFS 1.0 (Quantcast File System) to open source. This is based on the KFS 0.5 (Kosmos Distributed File System), a C++ distributed filesystem implementation. KFS plugs into Apache Hadoop via the 'kfs' shim that is part of Hadoop codebase. QFS has added support for permissions, and also, provides fault tolerance through Reed-Solomon encoding as well as replication. There are also a number of performance and stability improvements, including a rewrite of the client library to allow parallel concurrent I/Os. Going forward, new releases of KFS will come from QFS. The open source release of QFS is at http://quantcast.github.com/qfs QFS plugs into Apache Hadoop the same way KFS does. Currently, one would apply the patches or JARs from the QFS source tree onto Apache Hadoop to make Hadoop use QFS. The patch for Apache Hadoop 1.0.X can be found at https://github.com/quantcast/qfs/blob/master/hadoop/hadoop-1.0.X.patch In order to make the integration seamless, we would like to add a 'qfs' shim to Apache Hadoop so that the current active branches (1.0.X, 2.X.X, 0.23.X) of Apache Hadoop can use QFS. Towards this, I've submited an ASF JIRA feature ticket (HADOOP-8885) under hadoop-common project, and send a pull request with the QFS shim changes to https://github.com/apache/hadoop-common/tree/branch-1.0.2 I will subsequently submit pull requests to the other active Hadoop branches. If you have any question, I will be happy yo answer or provide more details on QFS. - Thilee
[jira] [Created] (HADOOP-8886) Remove KFS support
Eli Collins created HADOOP-8886: --- Summary: Remove KFS support Key: HADOOP-8886 URL: https://issues.apache.org/jira/browse/HADOOP-8886 Project: Hadoop Common Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins KFS is no longer maintained (is replaced by QFS, which HADOOP-8885 is adding), let's remove it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Hadoop-1.0.4-rc0
+1 Thanks Matt. I verified the signatures and ran some basic MR jobs. Btw, it would be good to update the public KEYS repo with your key as well. See http://wiki.apache.org/hadoop/HowToRelease. Thanks, Eli On Thu, Oct 4, 2012 at 1:59 PM, Matt Foley ma...@apache.org wrote: Hi, There has been a request from several PMC members for a maintenance release of hadoop-1.0. Please download and test this release candidate, Hadoop-1.0.4-rc0. It is available at http://people.apache.org/~mattf/hadoop-1.0.4-rc0/ or from the Nexus maven repo. Release notes are at http://people.apache.org/~mattf/hadoop-1.0.4-rc0/releasenotes.html Vote will run for 1 week as usual, terminating at 2pm PDT, Thur 11 Oct 2012. Thank you, --Matt Foley Release Manager
[jira] [Created] (HADOOP-8873) Port HADOOP-8175 (Add mkdir -p flag) to branch-1
Eli Collins created HADOOP-8873: --- Summary: Port HADOOP-8175 (Add mkdir -p flag) to branch-1 Key: HADOOP-8873 URL: https://issues.apache.org/jira/browse/HADOOP-8873 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Eli Collins Per HADOOP-8551 let's port the mkdir -p option to branch-1 for a 1.x release to help users transition to the new shell behavior. In Hadoop 2.x mkdir currently requires the -p option to create parent directories but a program that specifies it won't work on 1.x since it doesn't support this option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8856) SecuirtyUtil#openSecureHttpConnection should use an authenticated URL even if kerberos is not enabled
Eli Collins created HADOOP-8856: --- Summary: SecuirtyUtil#openSecureHttpConnection should use an authenticated URL even if kerberos is not enabled Key: HADOOP-8856 URL: https://issues.apache.org/jira/browse/HADOOP-8856 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins HADOOP-8581 updated openSecureHttpConnection to use an ssl factory, however we only use it if kerberos security is enabled, so we'll fail to use it if SSL is enabled but kerberos/SPNEGO are not. This manifests itself as the 2NN failing to checkpoint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8856) SecuirtyUtil#openSecureHttpConnection should use an authenticated URL even if kerberos is not enabled
[ https://issues.apache.org/jira/browse/HADOOP-8856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-8856. - Resolution: Duplicate Dupe of HADOOP-8855 SecuirtyUtil#openSecureHttpConnection should use an authenticated URL even if kerberos is not enabled - Key: HADOOP-8856 URL: https://issues.apache.org/jira/browse/HADOOP-8856 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins HADOOP-8581 updated openSecureHttpConnection to use an ssl factory, however we only use it if kerberos security is enabled, so we'll fail to use it if SSL is enabled but kerberos/SPNEGO are not. This manifests itself as the 2NN failing to checkpoint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8857) hadoop.http.authentication.signature.secret.file should be created if the configured file does not exist
Eli Collins created HADOOP-8857: --- Summary: hadoop.http.authentication.signature.secret.file should be created if the configured file does not exist Key: HADOOP-8857 URL: https://issues.apache.org/jira/browse/HADOOP-8857 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.0-alpha Reporter: Eli Collins Priority: Minor AuthenticationFilterInitializer#initFilter fails if the configured {{hadoop.http.authentication.signature.secret.file}} does not exist, eg: {noformat} java.lang.RuntimeException: Could not read HTTP signature secret file: /var/lib/hadoop-hdfs/hadoop-http-auth-signature-secret {noformat} Creating /var/lib/hadoop-hdfs/hadoop-http-auth-signature-secret (populated with a string) fixes the issue. Per the auth docs If a secret is not provided a random secret is generated at start up time., which sounds like it means the file should be generated at startup with a random secrete, which doesn't seem to be the case. Also the instructions in the docs should be more clear in this regard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8859) Improve SecurityUtil#openSecureHttpConnection javadoc
Eli Collins created HADOOP-8859: --- Summary: Improve SecurityUtil#openSecureHttpConnection javadoc Key: HADOOP-8859 URL: https://issues.apache.org/jira/browse/HADOOP-8859 Project: Hadoop Common Issue Type: Improvement Components: documentation, security Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Per HADOOP-8855 SecurityUtil#openSecureHttpConnection is not SPNEGO specific since it supports other authenticators so we should update the javadoc to reflect that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Commits breaking compilation of MR 'classic' tests
How about adding this step to the MR PreCommit jenkins job so it's run as part test-patch? On Tue, Sep 25, 2012 at 7:48 PM, Arun C Murthy a...@hortonworks.com wrote: Committers, As most people are aware, the MapReduce 'classic' tests (in hadoop-mapreduce-project/src/test) still need to built using ant since they aren't mavenized yet. I've seen several commits (and 2 within the last hour i.e. MAPREDUCE-3681 and MAPREDUCE-3682) which lead me to believe developers/committers aren't checking for this. Henceforth, with all changes, before committing, please do run: $ mvn install $ cd hadoop-mapreduce-project $ ant veryclean all-jars -Dresolvers=internal These instructions were already in http://wiki.apache.org/hadoop/HowToReleasePostMavenization and I've just updated http://wiki.apache.org/hadoop/HowToContribute. thanks, Arun
[jira] [Created] (HADOOP-8812) ExitUtil#terminate should print Exception#toString
Eli Collins created HADOOP-8812: --- Summary: ExitUtil#terminate should print Exception#toString Key: HADOOP-8812 URL: https://issues.apache.org/jira/browse/HADOOP-8812 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Per Steve's feedback on ExitUtil#terminate should print Exception#toString rather than use getMessage as the latter may return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace
Eli Collins created HADOOP-8801: --- Summary: ExitUtil#terminate should capture the exception stack trace Key: HADOOP-8801 URL: https://issues.apache.org/jira/browse/HADOOP-8801 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Attachments: hadoop-8801.txt ExitUtil#terminate(status,Throwable) should capture and log the stack trace of the given throwable. This will help debug issues like HDFS-3933. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8804) Improve Web UIs when the wildcard address is used
Eli Collins created HADOOP-8804: --- Summary: Improve Web UIs when the wildcard address is used Key: HADOOP-8804 URL: https://issues.apache.org/jira/browse/HADOOP-8804 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0-alpha, 1.0.0 Reporter: Eli Collins Priority: Minor When IPC addresses are bound to the wildcard (ie the default config) the NN, JT (and probably RM etc) Web UIs are a little goofy. Eg 0 Hadoop Map/Reduce Administration and NameNode '0.0.0.0:18021' (active). Let's improve them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8807) Update README and website to reflect HADOOP-8662
Eli Collins created HADOOP-8807: --- Summary: Update README and website to reflect HADOOP-8662 Key: HADOOP-8807 URL: https://issues.apache.org/jira/browse/HADOOP-8807 Project: Hadoop Common Issue Type: Bug Components: documentation Reporter: Eli Collins HADOOP-8662 removed the various tabs from the website. Our top-level README.txt and the generated docs refer to them (eg hadoop.apache.org/core, /hdfs etc). Let's fix that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: test-patch and native code compile
Yea we want jenkins to run with native. How about adding making native optional in test-patch via a flag and updating the jenkins jobs to use it? On Thu, Sep 6, 2012 at 7:25 AM, Alejandro Abdelnur t...@cloudera.com wrote: Makes sense, though the Jenkins runs should continue to run w/ native, right? On Thu, Sep 6, 2012 at 12:49 AM, Hemanth Yamijala yhema...@gmail.com wrote: Hi, The test-patch script in Hadoop source runs a native compile with the patch. On platforms like MAC, there are issues with the native compile. For e.g we run into HADOOP-7147 that has been resolved as Won't fix. Hence, should we have a switch in test-patch to not run native compile ? Could open a JIRA and fix, if that's OK ? Thanks hemanth -- Alejandro
Re: Branch 2 release names
On Tue, Sep 4, 2012 at 11:55 AM, Owen O'Malley omal...@apache.org wrote: While cleaning up the subversion branches, I thought more about the branch 2 release names. I'm concerned if we backtrack and reuse release numbers it will be extremely confusing to users. It also creates problems for tools like Maven that parse version numbers and expect a left to right release numbering scheme (eg. 2.1.1-alpha 2.1.0). It also seems better to keep on the 2.0.x minor release until after we get a GA release off of the 2.0 branch. Therefore, I'd like to propose: 1. rename branch-2.0.1-alpha - branch-2.0 2. delete branch-2.1.0-alpha 3. stabilizing goes into branch-2.0 until it gets to GA 4. features go into branch-2 and will be branched into branch-2.1 later 5. The release tags can have the alpha/beta tags on them. Thoughts? branch-2.0.1-alpha is pretty far behind branch-2 at this point, both in terms of features merged to branch-2 (eg no auto failover or hsync) and bug fixes (iiuc it is just 2.0 plus a couple changes). From my hdfs pov the branch doesn't seem worth maintaining. I'd tweak this as follows: 1. delete branch-2.1.0-alpha 2. rename branch-2 - branch-2.0 some time after 0.23.3 is released 3. stabilizing goes into branch-2.0 until it gets to GA 4. features go into branch-2 and will be branched into branch-2.1 later 5. The release tags can have the alpha/beta tags on them. On the hdfs side most trunk work is for branch-2 so we're already merging trunk - branch-2, so delaying the third merge would help, and we're using feature branches for the big stuff (HDFS-3077) so they're being isolated that way. Thanks, Eli
[jira] [Created] (HADOOP-8769) Tests failures on the ARM hosts
Eli Collins created HADOOP-8769: --- Summary: Tests failures on the ARM hosts Key: HADOOP-8769 URL: https://issues.apache.org/jira/browse/HADOOP-8769 Project: Hadoop Common Issue Type: Test Components: build Affects Versions: 2.0.0-alpha Reporter: Eli Collins I created a [jenkins job|https://builds.apache.org/job/Hadoop-trunk-ARM] that runs on the ARM machines. The local build is now working and running tests (thanks Gavin!), however there are 40 test failures, looks like most are due to host configuration issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8722) Update BUILDING.txt with latest snappy info
[ https://issues.apache.org/jira/browse/HADOOP-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-8722. - Resolution: Fixed Fix Version/s: 2.2.0-alpha Target Version/s: (was: 2.2.0-alpha) Hadoop Flags: Reviewed I've committed this, thanks Colin! Update BUILDING.txt with latest snappy info --- Key: HADOOP-8722 URL: https://issues.apache.org/jira/browse/HADOOP-8722 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 2.2.0-alpha Reporter: Eli Collins Assignee: Colin Patrick McCabe Priority: Minor Fix For: 2.2.0-alpha Attachments: HADOOP-8722.002.patch, HADOOP-8722.003.patch, HADOOP-8722.004.patch, hadoop-8722.txt HADOOP-8620 changed the default of snappy.lib from snappy.prefix/lib to empty. This, in turn, means that you can't use {{-Dbundle.snappy}} without setting {{-Dsnappy.lib}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8752) Update website to reflect merged committer lists
Eli Collins created HADOOP-8752: --- Summary: Update website to reflect merged committer lists Key: HADOOP-8752 URL: https://issues.apache.org/jira/browse/HADOOP-8752 Project: Hadoop Common Issue Type: Task Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Let's update the website to reflect the merged committer list (http://s.apache.org/Owx), ie one top-level credits section next to the PMC section. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8744) Fix the OSX native build
Eli Collins created HADOOP-8744: --- Summary: Fix the OSX native build Key: HADOOP-8744 URL: https://issues.apache.org/jira/browse/HADOOP-8744 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 2.2.0-alpha Reporter: Eli Collins Per HADOOP-8737 let's fix the OSX native build, which was working as of HADOOP-3659 (v 0.21). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8740) Build target to generate findbugs html output
Eli Collins created HADOOP-8740: --- Summary: Build target to generate findbugs html output Key: HADOOP-8740 URL: https://issues.apache.org/jira/browse/HADOOP-8740 Project: Hadoop Common Issue Type: Improvement Components: build Reporter: Eli Collins It would be useful if there was a build target or flag to generate findbugs output. It would depend on {{mvn compile findbugs:findbugs}} and run {{$FINDBUGS_HOME/bin/convertXmlToText -html ../path/to/findbugsXml.xml findbugs.html}} to generate findbugs.html in the target directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8722) Update BUILDING.txt to indicate snappy.lib is required
Eli Collins created HADOOP-8722: --- Summary: Update BUILDING.txt to indicate snappy.lib is required Key: HADOOP-8722 URL: https://issues.apache.org/jira/browse/HADOOP-8722 Project: Hadoop Common Issue Type: Improvement Components: documentation Reporter: Eli Collins Priority: Minor HADOOP-8620 changed the default of snappy.lib from snappy.prefix/lib to the empty which means we need to update the following in BUILDING.txt to indicate it needs to be set (or restore the default to snappy.prefix/lib). {noformat} * Use -Dsnappy.prefix=(/usr/local) -Dbundle.snappy=(false) to compile {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8710) Remove ability for users to easily run the trash emptier
Eli Collins created HADOOP-8710: --- Summary: Remove ability for users to easily run the trash emptier Key: HADOOP-8710 URL: https://issues.apache.org/jira/browse/HADOOP-8710 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.0.0-alpha, 1.0.0 Reporter: Eli Collins Assignee: Eli Collins Attachments: hadoop-8710.txt Users can currently run the emptier via {{hadoop org.apache.hadoop.fs.Trash}}, which seems error prone as there's nothing in that command that suggests it runs the emptier and nothing that asks you before deleting the trash for all users (that the current user is capable of deleting). Given that the trash emptier runs server side (eg on the NN) let's remove the ability to easily run it client side. Marking as an incompatible change since someone expecting the hadoop command with this class specified to empty trash will no longer be able to (they'll need to create their own class that does this). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: S3 FS tests broken?
Trevor, Forgot to ask, since you can reproduce this can you confirm and see why S3Conf.get is returning null for test.fs.s3.name? On Mon, Aug 13, 2012 at 6:35 PM, Eli Collins e...@cloudera.com wrote: Passes for me locally, and the precondition that's failing (passing null to Conf#set) from the backtrace looks like the null is coming from: S3Conf.set(FS_DEFAULT_NAME_DEFAULT, S3Conf.get(test.fs.s3.name)); which is set in core-site.xml so something strange is going on. HADOOP-6296 looks related btw. On Mon, Aug 13, 2012 at 6:04 PM, Trevor tre...@scurrilous.com wrote: Anyone know why these tests have started failing? It happens for me locally and it just happened in Jenkins: https://builds.apache.org/job/PreCommit-HADOOP-Build/1288/ I don't see any obvious changes recently that would cause it. Tests in error: testCreateFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testCreateFileWithNullName(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testCreateExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testCreateFileInNonExistingDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testCreateDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testMkdirsFailsForSubdirectoryOfExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testIsDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testDeleteFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testDeleteNonExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testDeleteNonExistingFileInDir(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testDeleteDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testDeleteNonExistingDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testModificationTime(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testFileStatus(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testGetFileStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testListStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testListStatus(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testBlockSize(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testFsStatus(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWorkingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testMkdirs(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testMkdirsFailsForSubdirectoryOfExistingFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testGetFileStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testListStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testListStatus(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteReadAndDeleteEmptyFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteReadAndDeleteHalfABlock(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteReadAndDeleteOneBlock(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteReadAndDeleteOneAndAHalfBlocks(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteReadAndDeleteTwoBlocks(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testOverwrite(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteInNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testDeleteNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testDeleteRecursively(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testDeleteEmptyDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameNonExistentPath(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameFileMoveToNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameFileMoveToExistingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameFileAsExistingFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameFileAsExistingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameDirectoryMoveToNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameDirectoryMoveToExistingDirectory(org.apache.hadoop.fs.s3
[jira] [Created] (HADOOP-8689) Make trash a server side configuration option
Eli Collins created HADOOP-8689: --- Summary: Make trash a server side configuration option Key: HADOOP-8689 URL: https://issues.apache.org/jira/browse/HADOOP-8689 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Per ATM's suggestion in HADOOP-8598 for v2 let's make {{fs.trash.interval}} configured server side rather than client side. The {{fs.trash.checkpoint.interval}} option is already server side as the emptier runs in the NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8690) Shell may remove a file without going to trash even if skipTrash is not enabled
Eli Collins created HADOOP-8690: --- Summary: Shell may remove a file without going to trash even if skipTrash is not enabled Key: HADOOP-8690 URL: https://issues.apache.org/jira/browse/HADOOP-8690 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Priority: Minor Delete.java contains the following comment: {noformat} // TODO: if the user wants the trash to be used but there is any // problem (ie. creating the trash dir, moving the item to be deleted, // etc), then the path will just be deleted because moveToTrash returns // false and it falls thru to fs.delete. this doesn't seem right {noformat} If Trash#moveToAppropriateTrash returns false FsShell will delete the path even if skipTrash is not enabled. The comment isn't quite right as some of these failure scenarios result in exceptions not a false return value, and in the case of an exception we don't unconditionally delete the path. TrashPolicy#moveToTrash states that it only returns false if the item is already in the trash or trash is disabled, and the expected behavior for these cases is to just delete the path. However TrashPolicyDefault#moveToTrash also returns false if there's a problem creating the trash directory, so for this case I don't think we should throw an exception rather than return false. I also question the behavior of just deleting when the item is already in the trash as it may have changed since previously put in the trash and not been checkpointed yet. Seems like in this case we should move it to trash but with a file name suffix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: S3 FS tests broken?
Passes for me locally, and the precondition that's failing (passing null to Conf#set) from the backtrace looks like the null is coming from: S3Conf.set(FS_DEFAULT_NAME_DEFAULT, S3Conf.get(test.fs.s3.name)); which is set in core-site.xml so something strange is going on. HADOOP-6296 looks related btw. On Mon, Aug 13, 2012 at 6:04 PM, Trevor tre...@scurrilous.com wrote: Anyone know why these tests have started failing? It happens for me locally and it just happened in Jenkins: https://builds.apache.org/job/PreCommit-HADOOP-Build/1288/ I don't see any obvious changes recently that would cause it. Tests in error: testCreateFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testCreateFileWithNullName(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testCreateExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testCreateFileInNonExistingDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testCreateDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testMkdirsFailsForSubdirectoryOfExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testIsDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testDeleteFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testDeleteNonExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testDeleteNonExistingFileInDir(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testDeleteDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testDeleteNonExistingDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testModificationTime(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testFileStatus(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testGetFileStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testListStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testListStatus(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property value must not be null testBlockSize(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testFsStatus(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWorkingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testMkdirs(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testMkdirsFailsForSubdirectoryOfExistingFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testGetFileStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testListStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testListStatus(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteReadAndDeleteEmptyFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteReadAndDeleteHalfABlock(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteReadAndDeleteOneBlock(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteReadAndDeleteOneAndAHalfBlocks(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteReadAndDeleteTwoBlocks(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testOverwrite(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testWriteInNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testDeleteNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testDeleteRecursively(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testDeleteEmptyDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameNonExistentPath(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameFileMoveToNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameFileMoveToExistingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameFileAsExistingFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameFileAsExistingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameDirectoryMoveToNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameDirectoryMoveToExistingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract) testRenameDirectoryAsExistingFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)
[jira] [Created] (HADOOP-8687) Bump log4j to version 1.2.17
Eli Collins created HADOOP-8687: --- Summary: Bump log4j to version 1.2.17 Key: HADOOP-8687 URL: https://issues.apache.org/jira/browse/HADOOP-8687 Project: Hadoop Common Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Let's bump log4j from 1.2.15 to version 1.2.17. It and 16 are maintenance releases with good fixes and also remove some jar dependencies (javamail, jmx, jms). http://logging.apache.org/log4j/1.2/changes-report.html#a1.2.17 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: fs.local.block.size vs file.blocksize
Hi Ellis, fs.local.block.size is the default FileSystem block size, note however that most file systems (like HDFS, see DistributedFileSystem) override this, eg when using HDFS the default block size is configured with dfs.blocksize which defaults to 64mb. Note in v1 the default block size for hdfs was 64mb as well (configured via dfs.block.size, which dfs.blocksize replaces). Thanks, Eli On Sat, Aug 11, 2012 at 7:55 AM, Ellis H. Wilson III el...@cse.psu.edu wrote: Hi guys and gals, I originally posted a version of this question on the user list on a few days ago to no response, so I thought perhaps it delved a bit too far into the nitty-gritty to warrant one. My apologies for cross-listing. Can someone please briefly summarize the difference between these two parameters? I do not see deprecated warnings for fs.local.block.size when I run with it set. Furthermore, and I'm unsure if this is related, I see two copies of what is effectively RawLocalFileSystem.java (the other is local/RawLocalFs.java). It appears that the one in local/ is for the old abstract FileSystem class, whereas RawLocalFileSystem.java uses the new abstract class. Perhaps this is the root cause of the two parameters? Or does file.blocksize simply control the abstract class or some such thing? The practical answers I really need to get a handle on are the following: 1. Is the default for the file:// filesystem boosted to a 64MB blocksize in Hadoop 2.0? It was only 32MB in Hadoop 1.0, but it's not 100% clear to me that it is now a full 64MB. The core-site.xml docs online suggest it's been boosted. 2. If I alter the blocksize of file://, is it correct to presume that also will impact the shuffle block-size since that data goes locally? Thanks! ellis
Re: adding bin and net categories to JIRA/HADOOP
Works for me. I've been adding missing ones that make sense (eg webhdfs). On Mon, Aug 6, 2012 at 11:34 AM, Steve Loughran ste...@hortonworks.com wrote: There aren't bin and net categories in JIRA, yet often bugs go against the code there? Should I add them?
Re: Where can I find old Hadoop Common Release Binaries?
Hey Carlos, We don't release common separately from mr and hdfs. You can download the hadoop binaries from this site (or preferably it's mirror). http://apache.claz.org/hadoop/common Eg the 0.20.2 binary is here: http://apache.claz.org/hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz Thanks, Eli On Sat, Aug 4, 2012 at 8:01 PM, Carlos Andrade carlosvia...@gmail.com wrote: Dears, I am part of a multi institutional group that is doing some data analysis on different open source projects and we selected Hadoop Commons to be part of the data set. I would very much appreciate if one of you could confirm if Hadoop Commons only available binaries are located at http://www.apache.org/dyn/closer.cgi/hadoop/common/ I noticed some releases like hadoop-0.20.2, despite of the mirror doesn't seem to have the binaries available. Carlos Andrade http://carlosandrade.co
[jira] [Created] (HADOOP-8642) io.native.lib.available only controls zlib
Eli Collins created HADOOP-8642: --- Summary: io.native.lib.available only controls zlib Key: HADOOP-8642 URL: https://issues.apache.org/jira/browse/HADOOP-8642 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 2.0.0-alpha Reporter: Eli Collins Per core-default.xml {{io.native.lib.available}} indicates Should native hadoop libraries, if present, be used however it looks like it only affects zlib. Since we always load the native library this means we may use native libraries even if io.native.lib.available is set to false. Let's make the flag to work as advertised - rather than always load the native hadoop library we only attempt to load the library (and report that native is available) if this flag is set. Since io.native.lib.available defaults to true the default behavior should remain unchanged (except that now we wont actually try to load the library if this flag is disabled). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Java 7 and Hadoop
We should officially support it in Hadoop (one piece of BIGTOP-458 which is for supporting it across the whole stack). Now that HADOOP-8370 (fixes native compilation) is in the full tarball should work. A good next step would be updating JAVA_HOME on one of the Hadoop jenkins jobs to use jdk7. On Tue, Jul 31, 2012 at 9:17 AM, Thomas Graves tgra...@yahoo-inc.com wrote: I've seen more and more people using java 7. We are also planning to move to java 7 due to the eol of java 6 that Scott referenced. What are folks thoughts on making it officially supported by Hadoop? Is there a process for this or is it simply updating the wiki Eli mentioned after sufficient testing? Thanks, Tom On 4/26/12 4:25 PM, Eli Collins e...@cloudera.com wrote: Hey Scott, Nice. Please update this page with your experience when you get a chance: http://wiki.apache.org/hadoop/HadoopJavaVersions Thanks, Eli On Thu, Apr 26, 2012 at 2:03 PM, Scott Carey sc...@richrelevance.com wrote: Java 7 update 4 has been released. It is even available for MacOS X from Oracle: http://www.oracle.com/technetwork/java/javase/downloads/jdk-7u4-downloads-159 1156.html Java 6 will reach end of life in about 6 months. After that point, there will be no more public updates from Oracle for Java 6, even security updates. https://blogs.oracle.com/henrik/entry/updated_java_6_eol_date You can of course pay them for updates or build your own OpenJDK. The entire Hadoop ecosystem needs to test against Java 7 JDKs this year. I will be testing some small clusters of ours with JDK 7 in about a month, and my internal projects will start using Java 7 features shortly after. See the JDK roadmap: http://blogs.oracle.com/javaone/resource/java_keynote/slide_15_full_size.gif https://blogs.oracle.com/java/entry/moving_java_forward_java_strategy
Re: Build failed in Jenkins: Hadoop-Common-trunk #481
I fixed up the command jenkins was running to hopefully fix this error. On Tue, Jul 24, 2012 at 2:27 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/Hadoop-Common-trunk/481/changes Changes: [eli] HDFS-3709. TestStartup tests still binding to the ephemeral port. Contributed by Eli Collins [bobby] MAPREDUCE-3893. allow capacity scheduler configs max-apps and max-am-pct per queue (tgraves via bobby) [todd] HDFS-3697. Enable fadvise readahead by default. Contributed by Todd Lipcon. -- [...truncated 18195 lines...] [DEBUG] (s) reportsDirectory = https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/surefire-reports [DEBUG] (s) runOrder = filesystem [DEBUG] (s) session = org.apache.maven.execution.MavenSession@98f352 [DEBUG] (s) skip = false [DEBUG] (s) skipTests = false [DEBUG] (s) systemPropertyVariables = {hadoop.log.dir=null, hadoop.tmp.dir=null, java.net.preferIPv4Stack=true, java.security.egd=file:///dev/urandom, java.security.krb5.conf=https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/src/test/resources/krb5.conf, test.build.classes=null, test.build.data=https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/test-dir, test.build.dir=https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/test-dir, test.build.webapps=null, test.cache.data=null} [DEBUG] (s) testClassesDirectory = https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/test-classes [DEBUG] (s) testFailureIgnore = false [DEBUG] (s) testNGArtifactName = org.testng:testng [DEBUG] (s) testSourceDirectory = https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/src/test/java [DEBUG] (s) trimStackTrace = true [DEBUG] (s) useFile = true [DEBUG] (s) useManifestOnlyJar = true [DEBUG] (s) useSystemClassLoader = true [DEBUG] (s) useUnlimitedThreads = false [DEBUG] (s) workingDirectory = https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples [DEBUG] -- end configuration -- [INFO] No tests to run. [INFO] Surefire report directory: https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/surefire-reports [DEBUG] dummy:dummy:jar:1.0 (selected for null) [DEBUG] org.apache.maven.surefire:surefire-booter:jar:2.12:compile (selected for compile) [DEBUG] org.apache.maven.surefire:surefire-api:jar:2.12:compile (selected for compile) [DEBUG] Adding to surefire booter test classpath: /home/jenkins/.m2/repository/org/apache/maven/surefire/surefire-booter/2.12/surefire-booter-2.12.jar Scope: compile [DEBUG] Adding to surefire booter test classpath: /home/jenkins/.m2/repository/org/apache/maven/surefire/surefire-api/2.12/surefire-api-2.12.jar Scope: compile [DEBUG] Setting system property [java.net.preferIPv4Stack]=[true] [DEBUG] Setting system property [java.security.krb5.conf]=[https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/src/test/resources/krb5.conf] [DEBUG] Setting system property [tar]=[true] [DEBUG] Setting system property [test.build.data]=[https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/test-dir] [DEBUG] Setting system property [user.dir]=[https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples] [DEBUG] Setting system property [localRepository]=[/home/jenkins/.m2/repository] [DEBUG] Setting system property [test.build.dir]=[https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/test-dir] [DEBUG] Setting system property [basedir]=[https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples] [DEBUG] Setting system property [java.security.egd]=[file:///dev/urandom] [DEBUG] Using JVM: /home/jenkins/tools/java/jdk1.6.0_27/jre/bin/java [DEBUG] Setting environment variable [LD_LIBRARY_PATH]=[/home/jenkins/tools/java/jdk1.6.0_27/jre/lib/i386/server:/home/jenkins/tools/java/jdk1.6.0_27/jre/lib/i386:/home/jenkins/tools/java/jdk1.6.0_27/jre/../lib/i386:https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/native/target/usr/local/lib] [DEBUG] Setting environment variable [MALLOC_ARENA_MAX]=[4] [DEBUG] dummy:dummy:jar:1.0 (selected for null) [DEBUG] org.apache.maven.surefire:surefire-junit3:jar:2.12:test (selected for test) [DEBUG] org.apache.maven.surefire:surefire-api:jar:2.12:test (selected for test) [DEBUG] Adding
Re: 10 failures when run test on hadoop-common-project
Hi Yanbo, You can look at the individual tests output (eg hadoop-common-project/hadoop-common/target/surefire-reports/org.apache.hadoop.net.TestTableMapping.txt for TestTableMapping and so on) for an indication as to why the test failed. Thanks, Eli On Mon, Jul 23, 2012 at 3:50 AM, Yanbo Liang yanboha...@gmail.com wrote: Hi All, I just run the test code of hadoop-common-project with revision 1364560. It produced 10 FAILURES after I typed mvn test under the folder of hadoop-common-project. But the lastest build in the Jenkins server is Hadoop-Common-trunk#480, and there is no test failures or errors occurred. Is there some bugs in the test cases or some wrong operations that I have done? I list the FAILURES that I have encountered: Running org.apache.hadoop.net.TestTableMapping Tests run: 5, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.619 sec FAILURE! Running org.apache.hadoop.net.TestNetUtils Tests run: 38, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.562 sec FAILURE! Running org.apache.hadoop.util.TestDiskChecker Tests run: 14, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.875 sec FAILURE! Running org.apache.hadoop.ha.TestShellCommandFencer Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.158 sec FAILURE! Running org.apache.hadoop.security.TestSecurityUtil Tests run: 15, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.756 sec FAILURE! Running org.apache.hadoop.fs.TestFSMainOperationsLocalFileSystem Tests run: 49, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.371 sec FAILURE! Running org.apache.hadoop.fs.viewfs.TestViewFsTrash Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.601 sec FAILURE! Running org.apache.hadoop.fs.viewfs.TestFSMainOperationsLocalFileSystem Tests run: 98, Failures: 0, Errors: 98, Skipped: 0, Time elapsed: 2.437 sec FAILURE! Running org.apache.hadoop.fs.TestFileUtil Tests run: 10, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.9 sec FAILURE! Running org.apache.hadoop.fs.TestLocalDirAllocator Tests run: 27, Failures: 9, Errors: 0, Skipped: 0, Time elapsed: 1.152 sec FAILURE!
[jira] [Reopened] (HADOOP-8431) Running distcp wo args throws IllegalArgumentException
[ https://issues.apache.org/jira/browse/HADOOP-8431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins reopened HADOOP-8431: - I just tried this and confirmed it still fails with the latest build. In the future please try to reproduce the issue before you close it. hadoop-3.0.0-SNAPSHOT $ ./bin/hadoop distcp 12/07/23 19:21:48 ERROR tools.DistCp: Invalid arguments: java.lang.IllegalArgumentException: Target path not specified at org.apache.hadoop.tools.OptionsParser.parse(OptionsParser.java:86) at org.apache.hadoop.tools.DistCp.run(DistCp.java:102) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:368) Running distcp wo args throws IllegalArgumentException -- Key: HADOOP-8431 URL: https://issues.apache.org/jira/browse/HADOOP-8431 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Labels: newbie Running distcp w/o args results in the following: {noformat} hadoop-3.0.0-SNAPSHOT $ ./bin/hadoop distcp 12/05/23 18:49:04 ERROR tools.DistCp: Invalid arguments: java.lang.IllegalArgumentException: Target path not specified at org.apache.hadoop.tools.OptionsParser.parse(OptionsParser.java:86) at org.apache.hadoop.tools.DistCp.run(DistCp.java:102) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:368) Invalid arguments: Target path not specified {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8616) ViewFS configuration requires a trailing slash
Eli Collins created HADOOP-8616: --- Summary: ViewFS configuration requires a trailing slash Key: HADOOP-8616 URL: https://issues.apache.org/jira/browse/HADOOP-8616 Project: Hadoop Common Issue Type: Bug Components: viewfs Affects Versions: 2.0.0-alpha, 0.23.0 Reporter: Eli Collins If the viewfs config doesn't have a trailing slash commands like the following fail: {noformat} bash-3.2$ hadoop fs -ls -ls: Can not create a Path from an empty string Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...] {noformat} We hit this problem with the following configuration because hdfs://ha-nn-uri does not have a trailing /. {noformat} property namefs.viewfs.mounttable.foo.link./nameservices/ha-nn-uri/name valuehdfs://ha-nn-uri/value /property {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Common and hdfs Jenkins jobs now running tests
Hey gang, The tests were disabled on the common and hdfs jenkins jobs for some reason. This was hiding test failures in the tests that are not run by test-patch (eg hadoop-dist, see HDFS-3690). I've re-enabled the tests on these jobs and filed HADOOP-8610 to get test-patch on Hadoop to run the root projects (eg hadoop-tools). Thanks, Eli
[jira] [Created] (HADOOP-8598) Server-side Trash
Eli Collins created HADOOP-8598: --- Summary: Server-side Trash Key: HADOOP-8598 URL: https://issues.apache.org/jira/browse/HADOOP-8598 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Critical There are a number of problems with Trash that continues to result in permanent data loss for users. The primary reasons trash is not used: - Trash is configured client-side and not enabled by default. - Trash is shell-only. FileSystem, WebHDFS, HttpFs, etc never use trash. - If trash fails, for example, because we can't create the trash directory or the move itself fails, trash is bypassed and the data is deleted. Trash was designed as a feature to help end users via the shell, however in my experience the primary use of trash is to help administrators implement data retention policies (this was also the motivation for HADOOP-7460). One could argue that (periodic read-only) snapshots are a better solution to this problem, however snapshots are not slated for Hadoop 2.x and trash is complimentary to snapshots (and backup) - eg you may create and delete data within your snapshot or backup window - so it makes sense to revisit trash's design. I think it's worth bringing trash's functionality in line with what users need. I propose we enable trash on a per-filesystem basis and implement it server-side. Ie trash becomes an HDFS feature enabled by administrators. Because the trash emptier lives in HDFS and users already have a per-filesystem trash directory we're mostly there already. The design preference from HADOOP-2514 was for trash to be implemented in user code however (a) in light of these problems, (b) we have a lot more user-facing APIs than the shell and (c) clients increasingly span file systems (via federation and symlinks) this design choice makes less sense. This is why we already use a per-filesystem trash/home directory instead of the user's client-configured one - otherwise trash would not work because renames can't span file systems. In short, HDFS trash would work similarly to how it does today, the difference is that client delete APIs would result in a rename into trash (ala TrashPolicyDefault#moveToTrash) if trash is enabled. Like today it would be renamed to the trash directory on the file system where the file being removed resides. The primary difference is that enablement and policy are configured server-side by adminstrators and is used regardless of the API used to access the filesytem. The one execption to this is that I think we should continue to support the explict skipTrash shell option. The rationale for skipTrash (HADOOP-6080) is that a move to trash may fail in cases where a rm may not, if a user has a home directory quota and does a rmr /tonsOfData, for example. Without a way to bypass this the user has no way (unless we revisit quotas, permissions or trash paths) to remove a directory they have permissions to remove without getting their quota adjusted by an admin. The skip trash API can be implemented by adding an explicit FileSystem API that bypasses trash and modifying the shell to use it when skipTrash is enabled. Given that users must explicitly specify skipTrash the API is less error prone. We could have the shell ask confirmation and annotate the API private to FsShell to discourage programatic use. This is not ideal but can be done compatibly (unlike redefining quotas, permissions or trash paths). In terms of compatibility, while this proposal is technically an incompatible change (client side configuration that disables trash and uses skipTrash with a previous FsShell release will now both be ignored if server-side trash is enabled, and non-HDFS file systems would need to make similar changes) I think it's worth targeting for Hadoop 2.x given that the new semantics preserve the current semantics. In 2.x I think we should preserve FsShell based trash and support both it and server-side trash (defaults to disabled). For trunk/3.x I think we should remove the FsShell based trash entirely and enable server-side trash by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8594) Upgrade to findbugs 2
Eli Collins created HADOOP-8594: --- Summary: Upgrade to findbugs 2 Key: HADOOP-8594 URL: https://issues.apache.org/jira/browse/HADOOP-8594 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 2.0.0-alpha Reporter: Eli Collins Harsh recently ran findbugs 2 (instead of 1.3.9 which is what jenkins runs) and it showed thousands of warnings (they've made a lot of progress in findbugs releases). We should upgrade to findbugs 2 and fix these. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8595) Create security page in the docs and update the ASF page to link to it
Eli Collins created HADOOP-8595: --- Summary: Create security page in the docs and update the ASF page to link to it Key: HADOOP-8595 URL: https://issues.apache.org/jira/browse/HADOOP-8595 Project: Hadoop Common Issue Type: Task Reporter: Eli Collins We should (1) create a http://hadoop.apache.org/security.html with info on how to report a security problem, pointer to secur...@hadoop.apache.org, and (2) get it linked off the main ASF page: http://www.apache.org/security/projects.html. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8596) TestFileAppend4#testCompleteOtherLeaseHoldersFile times out
Eli Collins created HADOOP-8596: --- Summary: TestFileAppend4#testCompleteOtherLeaseHoldersFile times out Key: HADOOP-8596 URL: https://issues.apache.org/jira/browse/HADOOP-8596 Project: Hadoop Common Issue Type: Task Affects Versions: 2.0.0-alpha Reporter: Eli Collins Saw TestFileAppend4#testCompleteOtherLeaseHoldersFile on a recent jenkins run. Perhaps we need to bump the timeout? {noformat} Error Message test timed out after 6 milliseconds Stacktrace java.lang.Exception: test timed out after 6 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1186) at java.lang.Thread.join(Thread.java:1239) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.join(BPServiceActor.java:477) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.join(BPOfferService.java:259) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.shutDownAll(BlockPoolManager.java:117) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1101) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1343) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1323) at org.apache.hadoop.hdfs.TestFileAppend4.testCompleteOtherLeaseHoldersFile(TestFileAppend4.java:289) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7836) TestSaslRPC#testDigestAuthMethodHostBasedToken fails with hostname localhost.localdomain
[ https://issues.apache.org/jira/browse/HADOOP-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-7836. - Resolution: Fixed Fix Version/s: 1.2.0 Target Version/s: (was: 1.1.0) I've committed this, thanks Daryn! Do we need a jira for the same test forward ported to trunk? TestSaslRPC#testDigestAuthMethodHostBasedToken fails with hostname localhost.localdomain Key: HADOOP-7836 URL: https://issues.apache.org/jira/browse/HADOOP-7836 Project: Hadoop Common Issue Type: Bug Components: ipc, test Affects Versions: 1.1.0 Reporter: Eli Collins Priority: Minor Fix For: 1.2.0 Attachments: HADOOP-7836.patch, hadoop-7836.txt TestSaslRPC#testDigestAuthMethodHostBasedToken fails on branch-1 on some hosts. null expected:localhost[] but was:localhost[.localdomain] junit.framework.ComparisonFailure: null expected:localhost[] but was:localhost[.localdomain] null expected:[localhost] but was:[eli-thinkpad] junit.framework.ComparisonFailure: null expected:[localhost] but was:[eli-thinkpad] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8592) Hadoop-auth should use o.a.h.util.Time methods instead of System#currentTimeMillis
Eli Collins created HADOOP-8592: --- Summary: Hadoop-auth should use o.a.h.util.Time methods instead of System#currentTimeMillis Key: HADOOP-8592 URL: https://issues.apache.org/jira/browse/HADOOP-8592 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Eli Collins Priority: Minor HDFS-3641 moved HDFS' Time methods to common so they can be used by MR (and eventually others). We should replace used of System#currentTimeMillis in MR with Time#now (or Time#monotonicNow when computing intervals, eg to sleep). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8587) HarFileSystem access of harMetaCache isn't threadsafe
[ https://issues.apache.org/jira/browse/HADOOP-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-8587. - Resolution: Fixed Fix Version/s: 2.0.1-alpha 1.2.0 Target Version/s: (was: 1.2.0, 2.0.1-alpha) Hadoop Flags: Reviewed Thanks for the review Daryn. I've committed this to trunk and merged to branch-2 and branch-1. HarFileSystem access of harMetaCache isn't threadsafe - Key: HADOOP-8587 URL: https://issues.apache.org/jira/browse/HADOOP-8587 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Fix For: 1.2.0, 2.0.1-alpha Attachments: hadoop-8587-b1.txt, hadoop-8587.txt, hadoop-8587.txt HarFileSystem's use of the static harMetaCache map is not threadsafe. Credit to Todd for pointing this out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: a bug in ViewFs tests
Hey Andrew. I'd open a new jira, and thanks for the find btw! Thanks, Eli On Wed, Jul 11, 2012 at 4:08 PM, Andrey Klochkov akloch...@griddynamics.com wrote: Hi, I noticed that the fix done in HADOOP-8036 (failing ViewFs tests) was reverted later when resolving HADOOP-8129, so the bug exists both in 0.23 and 2.0. I'm going to provide an alternative fix. Should I reopen HADOOP-8036 or create a new one instead? Thanks. -- Andrey Klochkov
[jira] [Resolved] (HADOOP-8584) test-patch.sh should not immediately exit when no tests are added or modified
[ https://issues.apache.org/jira/browse/HADOOP-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-8584. - Resolution: Fixed Fix Version/s: 3.0.0 Hadoop Flags: Reviewed I've committed this to trunk. test-patch.sh should not immediately exit when no tests are added or modified - Key: HADOOP-8584 URL: https://issues.apache.org/jira/browse/HADOOP-8584 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 3.0.0 Attachments: HADOOP-8584.001.patch test-patch.sh should not immediately exit when no tests are added or modified. Although it's good to note whether or not a patch introduces or modifies tests, it's not good to abort the Jenkins patch process if it did not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8587) HarFileSystem access of harMetaCache isn't threadsafe
Eli Collins created HADOOP-8587: --- Summary: HarFileSystem access of harMetaCache isn't threadsafe Key: HADOOP-8587 URL: https://issues.apache.org/jira/browse/HADOOP-8587 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins HarFileSystem's use of the static harMetaCache map is not threadsafe. Credit to Todd for pointing this out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8554) KerberosAuthenticator should use the configured principal
[ https://issues.apache.org/jira/browse/HADOOP-8554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-8554. - Resolution: Invalid You're right, thanks for the explanation, I didn't realize the principal config was server-side only. Also, the reason I hit this with webhdfs and not hftp is that hftp doesn't support SPNEGO. KerberosAuthenticator should use the configured principal - Key: HADOOP-8554 URL: https://issues.apache.org/jira/browse/HADOOP-8554 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 1.0.0, 2.0.0-alpha, 2.0.1-alpha, 3.0.0 Reporter: Eli Collins Labels: security, webconsole In KerberosAuthenticator we construct the principal as follows: {code} String servicePrincipal = HTTP/ + KerberosAuthenticator.this.url.getHost(); {code} Seems like we should use the configured hadoop.http.authentication.kerberos.principal instead right? I hit this issue as a distcp using webhdfs://localhost fails because HTTP/localhost is not in the kerb DB but using webhdfs://eli-thinkpad works because HTTP/eli-thinkpad is (and is my configured principal). distcp using Hftp://localhost with the same config works so it looks like this check is webhdfs specific for some reason (webhdfs is using spnego and hftp is not?). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8565) AuthenticationFilter#doFilter warns unconditionally when using SPNEGO
Eli Collins created HADOOP-8565: --- Summary: AuthenticationFilter#doFilter warns unconditionally when using SPNEGO Key: HADOOP-8565 URL: https://issues.apache.org/jira/browse/HADOOP-8565 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0-alpha, 1.0.3 Reporter: Eli Collins The following code in AuthenticationFilter#doFilter throws AuthenticationException (and warns) unconditionally because KerberosAuthenticator#authenticate returns null if SPNEGO is used. {code} token = authHandler.authenticate(httpRequest, httpResponse); ... if (token != null) { ... } else { throw new AuthenticationException } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8568) DNS#reverseDns fails on IPv6 addresses
Eli Collins created HADOOP-8568: --- Summary: DNS#reverseDns fails on IPv6 addresses Key: HADOOP-8568 URL: https://issues.apache.org/jira/browse/HADOOP-8568 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins DNS#reverseDns assumes hostIp is a v4 address (4 parts separated by dots), blows up if given a v6 address: {noformat} Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.hadoop.net.DNS.reverseDns(DNS.java:79) at org.apache.hadoop.net.DNS.getHosts(DNS.java:237) at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:340) at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:358) at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:337) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:235) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1649) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8554) KerberosAuthenticator should use the configured principal
Eli Collins created HADOOP-8554: --- Summary: KerberosAuthenticator should use the configured principal Key: HADOOP-8554 URL: https://issues.apache.org/jira/browse/HADOOP-8554 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 1.0.0 Reporter: Eli Collins In KerberosAuthenticator we construct the principal as follows: {code} String servicePrincipal = HTTP/ + KerberosAuthenticator.this.url.getHost(); {code} Seems like we should use the configured hadoop.http.authentication.kerberos.principal instead right? I hit this issue as a distcp using webhdfs://localhost fails because HTTP/localhost is not in the kerb DB but using webhdfs://eli-thinkpad works because HTTP/eli-thinkpad is (and is my configured principal). distcp using Hftp://localhost with the same config works so it looks like this check is webhdfs specific for some reason (webhdfs is using spnego and hftp is not?). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8546) missing hdfs user guide and command pages from hadoop project page
[ https://issues.apache.org/jira/browse/HADOOP-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-8546. - Resolution: Duplicate Per HDFS-3458 the forest docs needed to be ported to APT. missing hdfs user guide and command pages from hadoop project page Key: HADOOP-8546 URL: https://issues.apache.org/jira/browse/HADOOP-8546 Project: Hadoop Common Issue Type: Improvement Reporter: Jason Shih Priority: Trivial cant find the web master contact thus file issue in jira. two of the following links are missing from hadoop project page, maybe you can help restoring back them. thanks http://hadoop.apache.org/common/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs_user_guide.html http://hadoop.apache.org/common/docs/current/hadoop-project-dist/hadoop-common/commands_manual.html Cheers, Jason -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Clarification about 2.* branches
On Wed, Jun 6, 2012 at 2:10 PM, Eli Collins e...@cloudera.com wrote: Hey Vinod, Out of curiosity, why delete the branch? Might make spelunking for svn revs that correspond to a release hard. If we're deleting old release branches might as well delete the others (branch-0.23.0, branch-0.23.0-rc0, branch-0.23.1, branch-0.23.2, etc) as well? Oops, shouldn't have listed branch-0.23.2 as 0.23.2 isn't out yet. Thanks, Eli On Wed, Jun 6, 2012 at 1:52 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: I checked with Nicholas also who did the only merge into branch-2.0.0-alpha after the release and got a confirmation that I can delete the branch. Removing the branch now. Thanks, +Vinod On Jun 5, 2012, at 5:45 PM, Arun C Murthy wrote: +1 for blowing away branch-2.0.0-alpha now, we have the release tag if necessary. thanks, Arun On Jun 5, 2012, at 4:40 PM, Vinod Kumar Vavilapalli wrote: Hi, I see two branches for 2.* now: branch-2 and branch-2.0.0-alpha and two sections in CHANGES.txt: 2.0.1-alpha unreleased and 2.0.0-alpha released. It is a little confusing, seems like branch-2.0.0-alpha was a staging branch for the release and can be thrown away. Anyone committing patches to both branch-2 and branch-2.0.0-alpha? If not, I'll blow away the later and rename the section in CHANGES.txt to branch-2. Thanks, +Vinod -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
[jira] [Resolved] (HADOOP-8430) Backport new FileSystem methods introduced by HADOOP-8014 to branch-1
[ https://issues.apache.org/jira/browse/HADOOP-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-8430. - Resolution: Fixed Fix Version/s: 1.1.0 Target Version/s: (was: 1.1.0) Thanks Daryn. I've committed this to branch-1 and merged to branch-1.1 Backport new FileSystem methods introduced by HADOOP-8014 to branch-1 -- Key: HADOOP-8430 URL: https://issues.apache.org/jira/browse/HADOOP-8430 Project: Hadoop Common Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hadoop-8430-1.txt Per HADOOP-8422 let's backport the new FileSystem methods from HADOOP-8014 to branch-1 so users can transition over in Hadoop 1.x releases, which helps upstream projects like HBase work against federation (see HBASE-6067). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Hadoop view on Jenkins
Hey gang, fyi, I created a view in jenkins so you can see all the enabled Hadoop builds: https://builds.apache.org/view/Hadoop Thanks, Eli
[jira] [Created] (HADOOP-8463) hadoop.security.auth_to_local needs a key definition and doc
Eli Collins created HADOOP-8463: --- Summary: hadoop.security.auth_to_local needs a key definition and doc Key: HADOOP-8463 URL: https://issues.apache.org/jira/browse/HADOOP-8463 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.0.0-alpha Reporter: Eli Collins hadoop.security.auth_to_local should be defined in CommonConfigurationKeysPublic.java, and update the uses of the raw string in common and hdfs with the key. It's definition in core-site.xml should also be updated with a description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: cmake
On Wed, May 23, 2012 at 8:58 AM, Owen O'Malley omal...@apache.org wrote: After a quick look at cmake, it seems reasonable. Please use feature tests rather than OS tests. (By that I mean that if Solaris needs foobar.h and RHEL needs sys/foobar.h to get the definition for foobar, the code should ifdef whether they need sys/foobar.h and not if it is solaris or not.) On a related note, on what platforms does the native code currently compile and work? I've compiled and tested on: RHEL CentOS 5 RHEL CentOS 6 I've heard of it compiling on: Solaris What other ones are out there? Also multiple versions of OUL, SLES, Debian, and Ubuntu.
[jira] [Created] (HADOOP-8430) Backport new FileSystem methods introduced by HADOOP-8014 to branch-1
Eli Collins created HADOOP-8430: --- Summary: Backport new FileSystem methods introduced by HADOOP-8014 to branch-1 Key: HADOOP-8430 URL: https://issues.apache.org/jira/browse/HADOOP-8430 Project: Hadoop Common Issue Type: Bug Reporter: Eli Collins Assignee: Eli Collins Per HADOOP-8422 let's backport the new FileSystem methods from HADOOP-8014 to branch-1 so users can transition over in Hadoop 1.x releases, which helps upstream projects like HBase work against federation (see HBASE-6067). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8431) Running distcp wo args throws IllegalArgumentException
Eli Collins created HADOOP-8431: --- Summary: Running distcp wo args throws IllegalArgumentException Key: HADOOP-8431 URL: https://issues.apache.org/jira/browse/HADOOP-8431 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Running distcp w/o args results in the following: {noformat} hadoop-3.0.0-SNAPSHOT $ ./bin/hadoop distcp 12/05/23 18:49:04 ERROR tools.DistCp: Invalid arguments: java.lang.IllegalArgumentException: Target path not specified at org.apache.hadoop.tools.OptionsParser.parse(OptionsParser.java:86) at org.apache.hadoop.tools.DistCp.run(DistCp.java:102) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:368) Invalid arguments: Target path not specified {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7069) Replace forrest with supported framework
[ https://issues.apache.org/jira/browse/HADOOP-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HADOOP-7069. - Resolution: Fixed Fix Version/s: (was: 0.24.0) 2.0.0 Replace forrest with supported framework Key: HADOOP-7069 URL: https://issues.apache.org/jira/browse/HADOOP-7069 Project: Hadoop Common Issue Type: Improvement Components: documentation Reporter: Jakob Homan Fix For: 2.0.0 It's time to burn down the forrest. Apache forrest, which is used to generate the documentation for all three subprojects, has not had a release in several years (0.8, the version we use was released April 18, 2007), and requires JDK5, which was EOL'ed in November 2009. Since it doesn't seem likely Forrest will be developed any more, and JDK5 is not shipped with recent OSX versions, or included by default in most linux distros, we should look to find a new documentation system and convert the current docs to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8422) FileSystem#getDefaultBlockSize
Eli Collins created HADOOP-8422: --- Summary: FileSystem#getDefaultBlockSize Key: HADOOP-8422 URL: https://issues.apache.org/jira/browse/HADOOP-8422 Project: Hadoop Common Issue Type: Bug Reporter: Eli Collins -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: cmake
+1 having a build tool that supports multiple platforms is worth the dependency. I've also had good experiences with cmake. On Mon, May 21, 2012 at 6:00 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Hi all, We'd like to use CMake instead of autotools to build native (C/C++) code in Hadoop. There are a lot of reasons to want to do this. For one thing, it is not feasible to use autotools on the Windows platform, because it depends on UNIX shell scripts, the m4 macro processor, and some other pieces of infrastructure which are not present on Windows. For another thing, CMake builds are substantially simpler and faster, because there is only one layer of generated code. With autotools, you have automake generating m4 code which autoconf reads, which it uses to generate a UNIX shell script, which then generates another UNIX shell script, which eventually generates Makefiles. CMake simply generates Makefiles out of CMakeLists.txt files-- much simpler to understand and debug, and much faster. CMake is a lot easier to learn. automake error messages can be very, very confusing. This is because you are essentially debugging a pile of shell scripts and macros, rather than a coherent whole. So you see error messages like autoreconf: cannot empty /tmp/ar0.4849: Is a directory or Can't locate object method path via package Autom4te... and so forth. CMake error messages come from the CMake application and they almost always immediately point you to the problem. From a build point of view, the net result of adopting CMake would be that you would no longer need automake and related programs installed to build the native parts of Hadoop. Instead, you would need CMake installed. CMake is packaged by Red Hat, even in RHEL5, so it shouldn't be difficult to install locally. It's also available for Mac OS X and Windows, as I mentioned earlier. The JIRA for this work is at https://issues.apache.org/jira/browse/HADOOP-8368 Thanks for reading. sincerely, Colin