[jira] [Created] (HADOOP-9091) Allow daemon startup when at least 1 (or configurable) disk is in an OK state.

2012-11-26 Thread Jelle Smet (JIRA)
Jelle Smet created HADOOP-9091:
--

 Summary: Allow daemon startup when at least 1 (or configurable) 
disk is in an OK state.
 Key: HADOOP-9091
 URL: https://issues.apache.org/jira/browse/HADOOP-9091
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 0.20.2
Reporter: Jelle Smet


The given example is if datanode disk definitions but should be applicable to 
all configuration where a list of disks are provided.

I have defined multiple local disks defined for a datanode:
property
namedfs.data.dir/name
value/data/01/dfs/dn,/data/02/dfs/dn,/data/03/dfs/dn,/data/04/dfs/dn,/data/05/dfs/dn,/data/06/dfs/dn/value
finaltrue/final
/property

When one of those disks breaks and is unmounted then the mountpoint (such as 
/data/03 in this example) becomes a regular directory which doesn't have the 
valid permissions and possible directory structure Hadoop is expecting.
When this situation happens, the datanode fails to restart because of this 
while actually we have enough disks in an OK state to proceed.  The only way 
around this is to alter the configuration and omit that specific disk 
configuration.

To my opinion, It would be more practical to let Hadoop daemons start when at 
least 1 disks/partition in the provided list is in a usable state.  This 
prevents having to roll out custom configurations for systems which have 
temporarily a disk (and therefor directory layout) missing.  This might also be 
configurable that at least X partitions out of he available ones are in OK 
state.







--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-9091) Allow daemon startup when at least 1 (or configurable) disk is in an OK state.

2012-11-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9091.
-

Resolution: Fixed

This feature is already available in all our current releases via the DN volume 
failure toleration properties. Please see 
https://issues.apache.org/jira/browse/HDFS-1592.

Resolving as not a problem. Please update to an inclusive release to have this 
addressed in your environment.

 Allow daemon startup when at least 1 (or configurable) disk is in an OK state.
 --

 Key: HADOOP-9091
 URL: https://issues.apache.org/jira/browse/HADOOP-9091
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 0.20.2
Reporter: Jelle Smet
  Labels: features, hadoop

 The given example is if datanode disk definitions but should be applicable to 
 all configuration where a list of disks are provided.
 I have defined multiple local disks defined for a datanode:
 property
 namedfs.data.dir/name
 value/data/01/dfs/dn,/data/02/dfs/dn,/data/03/dfs/dn,/data/04/dfs/dn,/data/05/dfs/dn,/data/06/dfs/dn/value
 finaltrue/final
 /property
 When one of those disks breaks and is unmounted then the mountpoint (such as 
 /data/03 in this example) becomes a regular directory which doesn't have the 
 valid permissions and possible directory structure Hadoop is expecting.
 When this situation happens, the datanode fails to restart because of this 
 while actually we have enough disks in an OK state to proceed.  The only way 
 around this is to alter the configuration and omit that specific disk 
 configuration.
 To my opinion, It would be more practical to let Hadoop daemons start when at 
 least 1 disks/partition in the provided list is in a usable state.  This 
 prevents having to roll out custom configurations for systems which have 
 temporarily a disk (and therefor directory layout) missing.  This might also 
 be configurable that at least X partitions out of he available ones are in OK 
 state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9092) Coverage fixing for org.apache.hadoop.mapreduce.jobhistory

2012-11-26 Thread Aleksey Gorshkov (JIRA)
Aleksey Gorshkov created HADOOP-9092:


 Summary: Coverage fixing for 
org.apache.hadoop.mapreduce.jobhistory 
 Key: HADOOP-9092
 URL: https://issues.apache.org/jira/browse/HADOOP-9092
 Project: Hadoop Common
  Issue Type: Test
  Components: tools
Reporter: Aleksey Gorshkov


Coverage fixing for package org.apache.hadoop.mapreduce.jobhistory 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


wiki write access request

2012-11-26 Thread Glen Mazza
Hi, I'm going through the Hadoop deployment instructions 
(http://wiki.apache.org/hadoop/GettingStartedWithHadoop) and am 
occasionally seeing things that can be better clarified or need 
updating.  Can I have write access to the Wiki so I can make text 
updates?  I'm user GlenMazza.


Thanks,
Glen

--
Glen Mazza
Talend Community Coders - coders.talend.com
blog: www.jroller.com/gmazza



Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

2012-11-26 Thread Robert Evans
+1, +1, 0

On 11/24/12 2:13 PM, Matt Foley ma...@apache.org wrote:

For discussion, please see previous thread [PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop
stack.

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.

3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
to
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.

Vote closes at 12:30pm PST on Saturday 1 December.
-
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.

Best regards,
--Matt



Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

2012-11-26 Thread Adam Berry

0, +1, -1 (non-binding)

Also, it feels like maybe the discussion should have been kept open a little 
longer, thanksgiving holidays last week meant that people may have missed it.

Cheers,
Adam

On Nov 26, 2012, at 10:16 AM, Robert Evans wrote:

 +1, +1, 0
 
 On 11/24/12 2:13 PM, Matt Foley ma...@apache.org wrote:
 
 For discussion, please see previous thread [PROPOSAL] introduce Python as
 build-time and run-time dependency for Hadoop and throughout Hadoop
 stack.
 
 This vote consists of three separate items:
 
 1. Contributors shall be allowed to use Python as a platform-independent
 scripting language for build-time tasks, and add Python as a build-time
 dependency.
 Please vote +1, 0, -1.
 
 2. Contributors shall be encouraged to use Maven tasks in combination with
 either plug-ins or Groovy scripts to do cross-platform build-time tasks,
 even under ant in Hadoop-1.
 Please vote +1, 0, -1.
 
 3. Contributors shall be allowed to use Python as a platform-independent
 scripting language for run-time tasks, and add Python as a run-time
 dependency.
 Please vote +1, 0, -1.
 
 Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
 to
 use Maven plug-ins or Groovy as the only means of cross-platform
 build-time
 tasks, or to simply continue using platform-dependent scripts as is being
 done today.
 
 Vote closes at 12:30pm PST on Saturday 1 December.
 -
 Personally, my vote is +1, +1, +1.
 I think #2 is preferable to #1, but still has many unknowns in it, and
 until those are worked out I don't want to delay moving to cross-platform
 scripts for build-time tasks.
 
 Best regards,
 --Matt
 



Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

2012-11-26 Thread Radim Kolar

-1, +1, -1


Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

2012-11-26 Thread Konstantin Boudnik
-1, +1, -1

Thanks

On Sat, Nov 24, 2012 at 12:13PM, Matt Foley wrote:
 For discussion, please see previous thread [PROPOSAL] introduce Python as
 build-time and run-time dependency for Hadoop and throughout Hadoop stack.
 
 This vote consists of three separate items:
 
 1. Contributors shall be allowed to use Python as a platform-independent
 scripting language for build-time tasks, and add Python as a build-time
 dependency.
 Please vote +1, 0, -1.
 
 2. Contributors shall be encouraged to use Maven tasks in combination with
 either plug-ins or Groovy scripts to do cross-platform build-time tasks,
 even under ant in Hadoop-1.
 Please vote +1, 0, -1.
 
 3. Contributors shall be allowed to use Python as a platform-independent
 scripting language for run-time tasks, and add Python as a run-time
 dependency.
 Please vote +1, 0, -1.
 
 Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
 use Maven plug-ins or Groovy as the only means of cross-platform build-time
 tasks, or to simply continue using platform-dependent scripts as is being
 done today.
 
 Vote closes at 12:30pm PST on Saturday 1 December.
 -
 Personally, my vote is +1, +1, +1.
 I think #2 is preferable to #1, but still has many unknowns in it, and
 until those are worked out I don't want to delay moving to cross-platform
 scripts for build-time tasks.
 
 Best regards,
 --Matt


Re: trailing whitespace

2012-11-26 Thread Radim Kolar

I've never understood why folks get worked up over a little trailing
whitespace here and there, since you can't see it and it doesn't affect
correctness. Spurious whitespace changes that make a review harder - those
are annoying. Trailing whitespace inadvertently left on lines where
legitimate changes were made in a patch - doesn't seem too harmful to me.


Trailing whitespace is annoying because:
if you have editor to set killing it, it will produce large patch.
if use use scroll up at end of line, then cursor will not jump to 
end of text but some space after it, it cost you more clicks for cursor 
movement and it is annoying if it ends of split line.
its good and standard practise to avoid it, git and other tools 
highlight it in red.
if you use ignore whitespace in git diff, it often produces patch 
failing to apply


Trailing whitespace can be striped by pre-commit hook.


[jira] [Resolved] (HADOOP-9066) Sorting for FileStatus[]

2012-11-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9066.
-

Resolution: Invalid

Since HADOOP-8934 is already adding FileStatus data based sorting in a place 
that matters, and this JIRA seems to just add a simple example of utilizing 
FileStatus comparatives, am resolving this as Invalid at the moment, as the 
example isn't too much of a value (given that the Javadoc already is clear for 
FileStatus, and there's no use-case for this stuff in MR, etc.) so far.

 Sorting for FileStatus[]
 

 Key: HADOOP-9066
 URL: https://issues.apache.org/jira/browse/HADOOP-9066
 Project: Hadoop Common
  Issue Type: Improvement
 Environment: java7 , RedHat9 , Hadoop 0.20.2 
 ,eclipse-jee-juno-linux-gtk.tar.gz
Reporter: david king
  Labels: patch
 Attachments: ConcreteFileStatusAscComparable.java, 
 ConcreteFileStatusDescComparable.java, FileStatusComparable.java, 
 FileStatusTool.java, TestFileStatusTool.java


   I will submit a batch of FileStatusTool that used to sort FileStatus by the 
 Comparator, the Comparator not only customer to realizate , but alse use the 
 example code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Hadoop Release 1.1.1

2012-11-26 Thread Arpit Gupta
+1, non binding

ran system tests on secure and non secure clusters and no new regressions were 
found.

--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/

On Nov 20, 2012, at 2:07 PM, Matt Foley ma...@apache.org wrote:

 Hello,
 Hadoop-1.1.1-rc0 is now available for evaluation and vote:
http://people.apache.org/~mattf/hadoop-1.1.1-rc0/
 or in the Nexus repository.
 
 The release notes are available at
http://people.apache.org/~mattf/hadoop-1.1.1-rc0/releasenotes.html
 
 20 bugs have been fixed, compared to release 1.1.0, with no backward
 incompatibilities.
 I took the opportunity to assure that all branch-1.0 changes are in 1.1.1,
 and all branch-1.1 changes are in branch-1.
 The jira database has been made consistent.
 
 Please vote.  Voting will end on Tuesday 27 Nov., at 2:05pm PST.
 
 Thank you,
 --Matt Foley
 release manager



Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

2012-11-26 Thread Giridharan Kesavan
+1, +1, +1

-Giri


On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley ma...@apache.org wrote:

 For discussion, please see previous thread [PROPOSAL] introduce Python as
 build-time and run-time dependency for Hadoop and throughout Hadoop stack.

 This vote consists of three separate items:

 1. Contributors shall be allowed to use Python as a platform-independent
 scripting language for build-time tasks, and add Python as a build-time
 dependency.
 Please vote +1, 0, -1.

 2. Contributors shall be encouraged to use Maven tasks in combination with
 either plug-ins or Groovy scripts to do cross-platform build-time tasks,
 even under ant in Hadoop-1.
 Please vote +1, 0, -1.

 3. Contributors shall be allowed to use Python as a platform-independent
 scripting language for run-time tasks, and add Python as a run-time
 dependency.
 Please vote +1, 0, -1.

 Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
 use Maven plug-ins or Groovy as the only means of cross-platform build-time
 tasks, or to simply continue using platform-dependent scripts as is being
 done today.

 Vote closes at 12:30pm PST on Saturday 1 December.
 -
 Personally, my vote is +1, +1, +1.
 I think #2 is preferable to #1, but still has many unknowns in it, and
 until those are worked out I don't want to delay moving to cross-platform
 scripts for build-time tasks.

 Best regards,
 --Matt



Re: commit access to hadoop

2012-11-26 Thread Radim Kolar



The main feature is that when you get the +1  vote you yourself get to deal 
with the grunge work of apply
patches to one or more svn branches, resyncing that with the git branches
you inevitably do your own work on.
no, main feature is major speed advantage. It takes forever to get 
something committed. I was annoyed with apache nutch last year and 
forked it, here is snapshot from forked codebase 
http://forum.lupa.cz/index.php?action=dlattach;topic=1674.0;attach=3439 
now its 160k LOC on top of apache nutch 1.4. If i worked with these 
guys, it would be never done because it took them 4 months to get 200 
lines patch reviewed.


Hadoop has huge backlog of patches, you need way more committers then 
you have today. I simply could not assign person to working on hadoop 
fulltime because if he submits mere 5 patches per day, you will be never 
able to process them.


Your current development process fail to scale. What are your plans for 
moving development faster?


Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

2012-11-26 Thread Radim Kolar



In the PROPOSAL thread you indicated this was for Hadoop1 because it is ANT
based. And the main reason was to remove saveVersion.sh.

Your #3  was not discussed in the proposal, was it?
it was part of original proposal but not discussed much because language 
war was more attractive option. You want vote like this?


1. Using external language vs maven plugin to build
2. Using external language for startup scripts vs JVM script language. 
Such as Jython use in websphere.

3. Choose python as external language


Re: trailing whitespace

2012-11-26 Thread Aaron T. Myers
OK, if folks want to do something to get rid of trailing whitespace in the
project I won't object, but it doesn't seem like that big a deal to me. A
pre-commit hook makes sense to me. I just don't want to see the QA bot flag
patches containing trailing whitespace, thus requiring more round trips on
patches.

--
Aaron T. Myers
Software Engineer, Cloudera



On Mon, Nov 26, 2012 at 10:53 AM, Radim Kolar h...@filez.com wrote:

  I've never understood why folks get worked up over a little trailing
 whitespace here and there, since you can't see it and it doesn't affect
 correctness. Spurious whitespace changes that make a review harder - those
 are annoying. Trailing whitespace inadvertently left on lines where
 legitimate changes were made in a patch - doesn't seem too harmful to me.


 Trailing whitespace is annoying because:
 if you have editor to set killing it, it will produce large patch.
 if use use scroll up at end of line, then cursor will not jump to end
 of text but some space after it, it cost you more clicks for cursor
 movement and it is annoying if it ends of split line.
 its good and standard practise to avoid it, git and other tools
 highlight it in red.
 if you use ignore whitespace in git diff, it often produces patch
 failing to apply

 Trailing whitespace can be striped by pre-commit hook.



[jira] [Created] (HADOOP-9094) Add interface audience and stability annotation to PathExceptions

2012-11-26 Thread Suresh Srinivas (JIRA)
Suresh Srinivas created HADOOP-9094:
---

 Summary: Add interface audience and stability annotation to 
PathExceptions
 Key: HADOOP-9094
 URL: https://issues.apache.org/jira/browse/HADOOP-9094
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas


HADOOP-9093 moved path related exceptions to o.a.h.fs. This jira tracks adding 
interface audience and stability to notation to those exceptions. It also 
tracks the comment from HADOOP-9093:

bq. I propose using FileNotFoundException instead of PathNotFoundException as 
it is already extensively used. Similarly use AccessControlException instead of 
PathAccessException. If folks agree, I will make that change in the next patch. 
Alternatively we could at least make these exceptions subclasses of the 
exception that I am proposing replacing them with.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-9095) TestNNThroughputBenchmark fails in branch-1

2012-11-26 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HADOOP-9095.


   Resolution: Fixed
Fix Version/s: 1-win
   1.2.0

I have committed this.  Thanks Jing!

 TestNNThroughputBenchmark fails in branch-1
 ---

 Key: HADOOP-9095
 URL: https://issues.apache.org/jira/browse/HADOOP-9095
 Project: Hadoop Common
  Issue Type: Bug
  Components: net
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Jing Zhao
Priority: Minor
 Fix For: 1.2.0, 1-win

 Attachments: HDFS-4204.b1.001.patch, HDFS-4204.b1.002.patch, 
 HDFS-4204.b1.003.patch


 {noformat}
 java.lang.StringIndexOutOfBoundsException: String index out of range: 0
 at java.lang.String.charAt(String.java:686)
 at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:539)
 at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:562)
 at 
 org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:88)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1047)
 ...
 at 
 org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$StatsDaemon.run(NNThroughputBenchmark.java:377)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Anybody know how to configure SSH for eclipse plugin

2012-11-26 Thread yiyu jia
In file org.apache.hadoop.eclipse.server.HadoopServer.java (version 1.0.4)
I find the following comments. Does it mean I can setup a SSH tunnel and
eclipse can connect to remove cluster main node? Any help will be
appreciated!

 * p
 * This class does not create any SSH connection anymore. Tunneling must be
 * setup outside of Eclipse for now (using Putty or ttssh -Dlt;portgt;
 * lt;hostgt;/tt)
 *



On Mon, Nov 26, 2012 at 7:17 PM, yiyu jia jia.y...@gmail.com wrote:


 Hi,

 Anybody tell me how to configure SSH for eclipse plugin? I guess eclipse
 plugin use SSH to connect with Map/Reduce locations. But, I found that it
 always use my local machine' s account name to connect with hadoop host
 servers.

 thanks and regards,

 Yiyu



 --
 **
 * Mr. Jia Yiyu*
 *   *
 * Email: jia.y...@gmail.com  *
 *   *
 * Web: http://yiyujia.blogspot.com/*

 ***



Re: Anybody know how to configure SSH for eclipse plugin

2012-11-26 Thread yiyu jia
Hi all,

how to make eclipse plugin to support SSH connection that need password (or
using certification stored somewhere)?

thanks in advance!

Yiyu


On Mon, Nov 26, 2012 at 9:08 PM, yiyu jia jia.y...@gmail.com wrote:

 In file org.apache.hadoop.eclipse.server.HadoopServer.java (version 1.0.4)
 I find the following comments. Does it mean I can setup a SSH tunnel and
 eclipse can connect to remove cluster main node? Any help will be
 appreciated!

  * p
  * This class does not create any SSH connection anymore. Tunneling must be
  * setup outside of Eclipse for now (using Putty or ttssh -Dlt;portgt;
  * lt;hostgt;/tt)
  *



 On Mon, Nov 26, 2012 at 7:17 PM, yiyu jia jia.y...@gmail.com wrote:


 Hi,

 Anybody tell me how to configure SSH for eclipse plugin? I guess eclipse
 plugin use SSH to connect with Map/Reduce locations. But, I found that it
 always use my local machine' s account name to connect with hadoop host
 servers.

 thanks and regards,

 Yiyu



 --
 **
 * Mr. Jia Yiyu*
 *   *
 * Email: jia.y...@gmail.com  *
 *   *
 * Web: http://yiyujia.blogspot.com/*

 ***






-- 
**
* Mr. Jia Yiyu*
*   *
* Email: jia.y...@gmail.com  *
*   *
* Web: http://yiyujia.blogspot.com/*
***


Refactor MetricsSystemImpl to allow for an on-demand publish system (HADOOP-9090)

2012-11-26 Thread Mostafa Elhemali
Hi all,
Yesterday I filed a JIRA
(HADOOP-9090https://issues.apache.org/jira/browse/HADOOP-9090)
to propose a refactoring of the MetricsSystemImpl class - the default
(only?) implementation of the Metrics2 system - to factor out some common
code in a base class and have another simple implementation that just does
on-demand publishing of metrics instead of the default periodic publishing.
The main motivation for filing this JIRA and the attached patch is that we
(Microsoft) have a need to publish metrics out of short-lived processes
(think hadoop fs -ls) and the periodic behavior of the default
implementation doesn't really work well for those. We could write our own
metrics system implementation (and we'll probably do that in the short
term) but that would mean duplicating a lot of great code that's already in
the MetricsSystemImpl class, hence the proposal to factor out the common
code into a base class.

Does that sound reasonable? Please comment on the JIRA directly or reply
here - if the proposal sounds awful (or great) or there's something I'm
fundamentally missing I'd love to hear that feedback.

Thanks!
Mostafa