[jira] [Created] (HADOOP-9091) Allow daemon startup when at least 1 (or configurable) disk is in an OK state.
Jelle Smet created HADOOP-9091: -- Summary: Allow daemon startup when at least 1 (or configurable) disk is in an OK state. Key: HADOOP-9091 URL: https://issues.apache.org/jira/browse/HADOOP-9091 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.20.2 Reporter: Jelle Smet The given example is if datanode disk definitions but should be applicable to all configuration where a list of disks are provided. I have defined multiple local disks defined for a datanode: property namedfs.data.dir/name value/data/01/dfs/dn,/data/02/dfs/dn,/data/03/dfs/dn,/data/04/dfs/dn,/data/05/dfs/dn,/data/06/dfs/dn/value finaltrue/final /property When one of those disks breaks and is unmounted then the mountpoint (such as /data/03 in this example) becomes a regular directory which doesn't have the valid permissions and possible directory structure Hadoop is expecting. When this situation happens, the datanode fails to restart because of this while actually we have enough disks in an OK state to proceed. The only way around this is to alter the configuration and omit that specific disk configuration. To my opinion, It would be more practical to let Hadoop daemons start when at least 1 disks/partition in the provided list is in a usable state. This prevents having to roll out custom configurations for systems which have temporarily a disk (and therefor directory layout) missing. This might also be configurable that at least X partitions out of he available ones are in OK state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9091) Allow daemon startup when at least 1 (or configurable) disk is in an OK state.
[ https://issues.apache.org/jira/browse/HADOOP-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-9091. - Resolution: Fixed This feature is already available in all our current releases via the DN volume failure toleration properties. Please see https://issues.apache.org/jira/browse/HDFS-1592. Resolving as not a problem. Please update to an inclusive release to have this addressed in your environment. Allow daemon startup when at least 1 (or configurable) disk is in an OK state. -- Key: HADOOP-9091 URL: https://issues.apache.org/jira/browse/HADOOP-9091 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.20.2 Reporter: Jelle Smet Labels: features, hadoop The given example is if datanode disk definitions but should be applicable to all configuration where a list of disks are provided. I have defined multiple local disks defined for a datanode: property namedfs.data.dir/name value/data/01/dfs/dn,/data/02/dfs/dn,/data/03/dfs/dn,/data/04/dfs/dn,/data/05/dfs/dn,/data/06/dfs/dn/value finaltrue/final /property When one of those disks breaks and is unmounted then the mountpoint (such as /data/03 in this example) becomes a regular directory which doesn't have the valid permissions and possible directory structure Hadoop is expecting. When this situation happens, the datanode fails to restart because of this while actually we have enough disks in an OK state to proceed. The only way around this is to alter the configuration and omit that specific disk configuration. To my opinion, It would be more practical to let Hadoop daemons start when at least 1 disks/partition in the provided list is in a usable state. This prevents having to roll out custom configurations for systems which have temporarily a disk (and therefor directory layout) missing. This might also be configurable that at least X partitions out of he available ones are in OK state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9092) Coverage fixing for org.apache.hadoop.mapreduce.jobhistory
Aleksey Gorshkov created HADOOP-9092: Summary: Coverage fixing for org.apache.hadoop.mapreduce.jobhistory Key: HADOOP-9092 URL: https://issues.apache.org/jira/browse/HADOOP-9092 Project: Hadoop Common Issue Type: Test Components: tools Reporter: Aleksey Gorshkov Coverage fixing for package org.apache.hadoop.mapreduce.jobhistory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
wiki write access request
Hi, I'm going through the Hadoop deployment instructions (http://wiki.apache.org/hadoop/GettingStartedWithHadoop) and am occasionally seeing things that can be better clarified or need updating. Can I have write access to the Wiki so I can make text updates? I'm user GlenMazza. Thanks, Glen -- Glen Mazza Talend Community Coders - coders.talend.com blog: www.jroller.com/gmazza
Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
+1, +1, 0 On 11/24/12 2:13 PM, Matt Foley ma...@apache.org wrote: For discussion, please see previous thread [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack. This vote consists of three separate items: 1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency. Please vote +1, 0, -1. 2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1. Please vote +1, 0, -1. 3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency. Please vote +1, 0, -1. Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today. Vote closes at 12:30pm PST on Saturday 1 December. - Personally, my vote is +1, +1, +1. I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks. Best regards, --Matt
Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
0, +1, -1 (non-binding) Also, it feels like maybe the discussion should have been kept open a little longer, thanksgiving holidays last week meant that people may have missed it. Cheers, Adam On Nov 26, 2012, at 10:16 AM, Robert Evans wrote: +1, +1, 0 On 11/24/12 2:13 PM, Matt Foley ma...@apache.org wrote: For discussion, please see previous thread [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack. This vote consists of three separate items: 1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency. Please vote +1, 0, -1. 2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1. Please vote +1, 0, -1. 3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency. Please vote +1, 0, -1. Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today. Vote closes at 12:30pm PST on Saturday 1 December. - Personally, my vote is +1, +1, +1. I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks. Best regards, --Matt
Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
-1, +1, -1
Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
-1, +1, -1 Thanks On Sat, Nov 24, 2012 at 12:13PM, Matt Foley wrote: For discussion, please see previous thread [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack. This vote consists of three separate items: 1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency. Please vote +1, 0, -1. 2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1. Please vote +1, 0, -1. 3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency. Please vote +1, 0, -1. Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today. Vote closes at 12:30pm PST on Saturday 1 December. - Personally, my vote is +1, +1, +1. I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks. Best regards, --Matt
Re: trailing whitespace
I've never understood why folks get worked up over a little trailing whitespace here and there, since you can't see it and it doesn't affect correctness. Spurious whitespace changes that make a review harder - those are annoying. Trailing whitespace inadvertently left on lines where legitimate changes were made in a patch - doesn't seem too harmful to me. Trailing whitespace is annoying because: if you have editor to set killing it, it will produce large patch. if use use scroll up at end of line, then cursor will not jump to end of text but some space after it, it cost you more clicks for cursor movement and it is annoying if it ends of split line. its good and standard practise to avoid it, git and other tools highlight it in red. if you use ignore whitespace in git diff, it often produces patch failing to apply Trailing whitespace can be striped by pre-commit hook.
[jira] [Resolved] (HADOOP-9066) Sorting for FileStatus[]
[ https://issues.apache.org/jira/browse/HADOOP-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-9066. - Resolution: Invalid Since HADOOP-8934 is already adding FileStatus data based sorting in a place that matters, and this JIRA seems to just add a simple example of utilizing FileStatus comparatives, am resolving this as Invalid at the moment, as the example isn't too much of a value (given that the Javadoc already is clear for FileStatus, and there's no use-case for this stuff in MR, etc.) so far. Sorting for FileStatus[] Key: HADOOP-9066 URL: https://issues.apache.org/jira/browse/HADOOP-9066 Project: Hadoop Common Issue Type: Improvement Environment: java7 , RedHat9 , Hadoop 0.20.2 ,eclipse-jee-juno-linux-gtk.tar.gz Reporter: david king Labels: patch Attachments: ConcreteFileStatusAscComparable.java, ConcreteFileStatusDescComparable.java, FileStatusComparable.java, FileStatusTool.java, TestFileStatusTool.java I will submit a batch of FileStatusTool that used to sort FileStatus by the Comparator, the Comparator not only customer to realizate , but alse use the example code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Hadoop Release 1.1.1
+1, non binding ran system tests on secure and non secure clusters and no new regressions were found. -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Nov 20, 2012, at 2:07 PM, Matt Foley ma...@apache.org wrote: Hello, Hadoop-1.1.1-rc0 is now available for evaluation and vote: http://people.apache.org/~mattf/hadoop-1.1.1-rc0/ or in the Nexus repository. The release notes are available at http://people.apache.org/~mattf/hadoop-1.1.1-rc0/releasenotes.html 20 bugs have been fixed, compared to release 1.1.0, with no backward incompatibilities. I took the opportunity to assure that all branch-1.0 changes are in 1.1.1, and all branch-1.1 changes are in branch-1. The jira database has been made consistent. Please vote. Voting will end on Tuesday 27 Nov., at 2:05pm PST. Thank you, --Matt Foley release manager
Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
+1, +1, +1 -Giri On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley ma...@apache.org wrote: For discussion, please see previous thread [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack. This vote consists of three separate items: 1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency. Please vote +1, 0, -1. 2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1. Please vote +1, 0, -1. 3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency. Please vote +1, 0, -1. Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today. Vote closes at 12:30pm PST on Saturday 1 December. - Personally, my vote is +1, +1, +1. I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks. Best regards, --Matt
Re: commit access to hadoop
The main feature is that when you get the +1 vote you yourself get to deal with the grunge work of apply patches to one or more svn branches, resyncing that with the git branches you inevitably do your own work on. no, main feature is major speed advantage. It takes forever to get something committed. I was annoyed with apache nutch last year and forked it, here is snapshot from forked codebase http://forum.lupa.cz/index.php?action=dlattach;topic=1674.0;attach=3439 now its 160k LOC on top of apache nutch 1.4. If i worked with these guys, it would be never done because it took them 4 months to get 200 lines patch reviewed. Hadoop has huge backlog of patches, you need way more committers then you have today. I simply could not assign person to working on hadoop fulltime because if he submits mere 5 patches per day, you will be never able to process them. Your current development process fail to scale. What are your plans for moving development faster?
Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
In the PROPOSAL thread you indicated this was for Hadoop1 because it is ANT based. And the main reason was to remove saveVersion.sh. Your #3 was not discussed in the proposal, was it? it was part of original proposal but not discussed much because language war was more attractive option. You want vote like this? 1. Using external language vs maven plugin to build 2. Using external language for startup scripts vs JVM script language. Such as Jython use in websphere. 3. Choose python as external language
Re: trailing whitespace
OK, if folks want to do something to get rid of trailing whitespace in the project I won't object, but it doesn't seem like that big a deal to me. A pre-commit hook makes sense to me. I just don't want to see the QA bot flag patches containing trailing whitespace, thus requiring more round trips on patches. -- Aaron T. Myers Software Engineer, Cloudera On Mon, Nov 26, 2012 at 10:53 AM, Radim Kolar h...@filez.com wrote: I've never understood why folks get worked up over a little trailing whitespace here and there, since you can't see it and it doesn't affect correctness. Spurious whitespace changes that make a review harder - those are annoying. Trailing whitespace inadvertently left on lines where legitimate changes were made in a patch - doesn't seem too harmful to me. Trailing whitespace is annoying because: if you have editor to set killing it, it will produce large patch. if use use scroll up at end of line, then cursor will not jump to end of text but some space after it, it cost you more clicks for cursor movement and it is annoying if it ends of split line. its good and standard practise to avoid it, git and other tools highlight it in red. if you use ignore whitespace in git diff, it often produces patch failing to apply Trailing whitespace can be striped by pre-commit hook.
[jira] [Created] (HADOOP-9094) Add interface audience and stability annotation to PathExceptions
Suresh Srinivas created HADOOP-9094: --- Summary: Add interface audience and stability annotation to PathExceptions Key: HADOOP-9094 URL: https://issues.apache.org/jira/browse/HADOOP-9094 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas HADOOP-9093 moved path related exceptions to o.a.h.fs. This jira tracks adding interface audience and stability to notation to those exceptions. It also tracks the comment from HADOOP-9093: bq. I propose using FileNotFoundException instead of PathNotFoundException as it is already extensively used. Similarly use AccessControlException instead of PathAccessException. If folks agree, I will make that change in the next patch. Alternatively we could at least make these exceptions subclasses of the exception that I am proposing replacing them with. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9095) TestNNThroughputBenchmark fails in branch-1
[ https://issues.apache.org/jira/browse/HADOOP-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HADOOP-9095. Resolution: Fixed Fix Version/s: 1-win 1.2.0 I have committed this. Thanks Jing! TestNNThroughputBenchmark fails in branch-1 --- Key: HADOOP-9095 URL: https://issues.apache.org/jira/browse/HADOOP-9095 Project: Hadoop Common Issue Type: Bug Components: net Reporter: Tsz Wo (Nicholas), SZE Assignee: Jing Zhao Priority: Minor Fix For: 1.2.0, 1-win Attachments: HDFS-4204.b1.001.patch, HDFS-4204.b1.002.patch, HDFS-4204.b1.003.patch {noformat} java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.String.charAt(String.java:686) at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:539) at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:562) at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:88) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1047) ... at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:377) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Anybody know how to configure SSH for eclipse plugin
In file org.apache.hadoop.eclipse.server.HadoopServer.java (version 1.0.4) I find the following comments. Does it mean I can setup a SSH tunnel and eclipse can connect to remove cluster main node? Any help will be appreciated! * p * This class does not create any SSH connection anymore. Tunneling must be * setup outside of Eclipse for now (using Putty or ttssh -Dlt;portgt; * lt;hostgt;/tt) * On Mon, Nov 26, 2012 at 7:17 PM, yiyu jia jia.y...@gmail.com wrote: Hi, Anybody tell me how to configure SSH for eclipse plugin? I guess eclipse plugin use SSH to connect with Map/Reduce locations. But, I found that it always use my local machine' s account name to connect with hadoop host servers. thanks and regards, Yiyu -- ** * Mr. Jia Yiyu* * * * Email: jia.y...@gmail.com * * * * Web: http://yiyujia.blogspot.com/* ***
Re: Anybody know how to configure SSH for eclipse plugin
Hi all, how to make eclipse plugin to support SSH connection that need password (or using certification stored somewhere)? thanks in advance! Yiyu On Mon, Nov 26, 2012 at 9:08 PM, yiyu jia jia.y...@gmail.com wrote: In file org.apache.hadoop.eclipse.server.HadoopServer.java (version 1.0.4) I find the following comments. Does it mean I can setup a SSH tunnel and eclipse can connect to remove cluster main node? Any help will be appreciated! * p * This class does not create any SSH connection anymore. Tunneling must be * setup outside of Eclipse for now (using Putty or ttssh -Dlt;portgt; * lt;hostgt;/tt) * On Mon, Nov 26, 2012 at 7:17 PM, yiyu jia jia.y...@gmail.com wrote: Hi, Anybody tell me how to configure SSH for eclipse plugin? I guess eclipse plugin use SSH to connect with Map/Reduce locations. But, I found that it always use my local machine' s account name to connect with hadoop host servers. thanks and regards, Yiyu -- ** * Mr. Jia Yiyu* * * * Email: jia.y...@gmail.com * * * * Web: http://yiyujia.blogspot.com/* *** -- ** * Mr. Jia Yiyu* * * * Email: jia.y...@gmail.com * * * * Web: http://yiyujia.blogspot.com/* ***
Refactor MetricsSystemImpl to allow for an on-demand publish system (HADOOP-9090)
Hi all, Yesterday I filed a JIRA (HADOOP-9090https://issues.apache.org/jira/browse/HADOOP-9090) to propose a refactoring of the MetricsSystemImpl class - the default (only?) implementation of the Metrics2 system - to factor out some common code in a base class and have another simple implementation that just does on-demand publishing of metrics instead of the default periodic publishing. The main motivation for filing this JIRA and the attached patch is that we (Microsoft) have a need to publish metrics out of short-lived processes (think hadoop fs -ls) and the periodic behavior of the default implementation doesn't really work well for those. We could write our own metrics system implementation (and we'll probably do that in the short term) but that would mean duplicating a lot of great code that's already in the MetricsSystemImpl class, hence the proposal to factor out the common code into a base class. Does that sound reasonable? Please comment on the JIRA directly or reply here - if the proposal sounds awful (or great) or there's something I'm fundamentally missing I'd love to hear that feedback. Thanks! Mostafa