Jenkins build is still unstable: Hadoop-Hdfs-0.23-Build #586

2013-04-19 Thread Apache Jenkins Server
See 



Hadoop-Hdfs-0.23-Build - Build # 586 - Still Unstable

2013-04-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/586/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 11800 lines...]
[INFO] Wrote classpath file 
'/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/target/classes/mrapp-generated-classpath'.
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ 
hadoop-hdfs-project ---
[INFO] Not executing Javadoc as the project is not a Java classpath-capable 
package
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ 
hadoop-hdfs-project ---
[INFO] Installing 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/pom.xml
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-hdfs-project/0.23.8-SNAPSHOT/hadoop-hdfs-project-0.23.8-SNAPSHOT.pom
[INFO] 
[INFO] --- maven-antrun-plugin:1.6:run (create-testdirs) @ hadoop-hdfs-project 
---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-dependency-plugin:2.1:build-classpath (build-classpath) @ 
hadoop-hdfs-project ---
[INFO] No dependencies found.
[INFO] Skipped writing classpath file 
'/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/target/classes/mrapp-generated-classpath'.
  No changes found.
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ 
hadoop-hdfs-project ---
[INFO] Not executing Javadoc as the project is not a Java classpath-capable 
package
[INFO] 
[INFO] --- maven-checkstyle-plugin:2.6:checkstyle (default-cli) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- findbugs-maven-plugin:2.3.2:findbugs (default-cli) @ 
hadoop-hdfs-project ---
[INFO] ** FindBugsMojo execute ***
[INFO] canGenerate is false
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop HDFS  SUCCESS [4:51.290s]
[INFO] Apache Hadoop HttpFS .. SUCCESS [49.147s]
[INFO] Apache Hadoop HDFS Project  SUCCESS [0.060s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 5:41.098s
[INFO] Finished at: Fri Apr 19 11:39:19 UTC 2013
[INFO] Final Memory: 51M/739M
[INFO] 
+ /home/jenkins/tools/maven/latest/bin/mvn test 
-Dmaven.test.failure.ignore=true -Pclover 
-DcloverLicenseLocation=/home/jenkins/tools/clover/latest/lib/clover.license
Archiving artifacts
Recording test results
Build step 'Publish JUnit test result report' changed build result to UNSTABLE
Publishing Javadoc
Recording fingerprints
Updating MAPREDUCE-4383
Updating YARN-71
Updating YARN-476
Sending e-mails to: hdfs-dev@hadoop.apache.org
Email was triggered for: Unstable
Sending email for trigger: Unstable



###
## FAILED TESTS (if any) 
##
2 tests failed.
REGRESSION:  
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy2

Error Message:
Timed out waiting for corrupt replicas. Waiting for 2, but only found 0

Stack Trace:
java.util.concurrent.TimeoutException: Timed out waiting for corrupt replicas. 
Waiting for 2, but only found 0
at 
org.apache.hadoop.hdfs.DFSTestUtil.waitCorruptReplicas(DFSTestUtil.java:330)
at 
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.blockCorruptionRecoveryPolicy(TestDatanodeBlockScanner.java:288)
at 
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.__CLR3_0_2t1dvac10c7(TestDatanodeBlockScanner.java:242)
at 
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy2(TestDatanodeBlockScanner.java:239)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.f

Hadoop-Hdfs-trunk - Build # 1377 - Still Failing

2013-04-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1377/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 14336 lines...]
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)

Running org.apache.hadoop.contrib.bkjournal.TestCurrentInprogress
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.686 sec
Running org.apache.hadoop.contrib.bkjournal.TestBookKeeperConfiguration
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.073 sec
Running org.apache.hadoop.contrib.bkjournal.TestBookKeeperJournalManager
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.534 sec

Results :

Failed tests:   
testStandbyExceptionThrownDuringCheckpoint(org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints):
 SBN should have still been checkpointing.

Tests run: 32, Failures: 1, Errors: 0, Skipped: 0

[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop HDFS  SUCCESS 
[1:29:28.170s]
[INFO] Apache Hadoop HttpFS .. SUCCESS [2:23.113s]
[INFO] Apache Hadoop HDFS BookKeeper Journal . FAILURE [59.363s]
[INFO] Apache Hadoop HDFS Project  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 1:32:51.423s
[INFO] Finished at: Fri Apr 19 13:06:35 UTC 2013
[INFO] Final Memory: 48M/812M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.12.3:test (default-test) on 
project hadoop-hdfs-bkjournal: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/target/surefire-reports
 for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-hdfs-bkjournal
Build step 'Execute shell' marked build as failure
Archiving artifacts
Updating MAPREDUCE-4932
Updating YARN-585
Updating YARN-493
Updating YARN-482
Updating MAPREDUCE-5163
Updating MAPREDUCE-5152
Updating HDFS-4434
Updating MAPREDUCE-4898
Updating YARN-514
Updating HADOOP-9486
Updating YARN-441
Sending e-mails to: hdfs-dev@hadoop.apache.org
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

Build failed in Jenkins: Hadoop-Hdfs-trunk #1377

2013-04-19 Thread Apache Jenkins Server
See 

Changes:

[vinodkv] YARN-493. Fixed some shell related flaws in YARN on Windows. 
Contributed by Chris Nauroth.
HADOOP-9486. Promoted Windows and Shell related utils from YARN to Hadoop 
Common. Contributed by Chris Nauroth.

[vinodkv] YARN-441. Removed unused utility methods for collections from two API 
records. Contributed by Xuan Gong.
MAPREDUCE-5163. Update MR App to not use API utility methods for collections 
after YARN-441. Contributed by Xuan Gong.

[suresh] HDFS-4434. Provide a mapping from INodeId to INode. Contributed by 
Suresh Srinivas.

[tucu] MAPREDUCE-4932. mapreduce.job#getTaskCompletionEvents incompatible with 
Hadoop 1. (rkanter via tucu)

[vinodkv] MAPREDUCE-5152. Make MR App to simply pass through the container from 
RM instead of extracting and populating information itself to start any 
container. Contributed by Vinod Kumar Vavilapalli.

[tucu] MAPREDUCE-4898. FileOutputFormat.checkOutputSpecs and 
FileOutputFormat.setOutputPath incompatible with MR1. (rkanter via tucu)

[tucu] YARN-482. FS: Extend SchedulingMode to intermediate queues. (kkambatl 
via tucu)

[vinodkv] YARN-585. Fix failure in 
TestFairScheduler#testNotAllowSubmitApplication caused by YARN-514. Contributed 
by Zhijie Shen.

--
[...truncated 14143 lines...]
Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Generating 

Re: [VOTE] Release Apache Hadoop 2.0.4-alpha

2013-04-19 Thread Derek Dagit
+1 (non-binding)

Verified sigs and sums
built
ran some simple jobs on a single-node cluster
-- 
Derek

On Apr 12, 2013, at 16:56, Arun C Murthy wrote:

> Folks,
> 
> I've created a release candidate (RC2) for hadoop-2.0.4-alpha that I would 
> like to release.
> 
> The RC is available at: 
> http://people.apache.org/~acmurthy/hadoop-2.0.4-alpha-rc2/
> The RC tag in svn is here: 
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4-alpha-rc2
> 
> The maven artifacts are available via repository.apache.org.
> 
> Please try the release and vote; the vote will run for the usual 7 days.
> 
> thanks,
> Arun
> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 



Re: [VOTE] Release Apache Hadoop 2.0.4-alpha

2013-04-19 Thread Tom White
+1

Checked sigs and checksums, source tag, and built from source.

Cheers,
Tom

On Fri, Apr 12, 2013 at 2:56 PM, Arun C Murthy  wrote:
> Folks,
>
> I've created a release candidate (RC2) for hadoop-2.0.4-alpha that I would 
> like to release.
>
> The RC is available at: 
> http://people.apache.org/~acmurthy/hadoop-2.0.4-alpha-rc2/
> The RC tag in svn is here: 
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4-alpha-rc2
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>


Convention question for using DFSConfigKey constants : are zeros magic?

2013-04-19 Thread Jay Vyas
I recently looked into the HDFS source tree to determine idioms with
respect to a hairy debate about the threshold between what is and is not a
magic number, and found that :  And it appears that the number zero is NOT
considered magic - at least not in the HDFS source code.

I found that that DFSConfigKeys.java defines DEFAULT values of zeros for
some fields, and those defaults result in non-quantitative interpretation
of the field.

For example:
dfs.image.transfer.bandwidthPerSec

Is commented like so:
public static final long DFS_IMAGE_TRANSFER_RATE_DEFAULT = 0;  //no
throttling

However in the implementation of these defaults, "magic" zeros are used
without commenting:
if (transferBandwidth > 0) {
  throttler = new DataTransferThrottler(transferBandwidth);
}



Seems like the 0 above would be better replaced with
DFS_IMAGE_TRANSFER_RATE_DEFAULT since the "no throttling" behaviour is
defined with the constant in the DFSConfigKeys file, and not defined in the
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java.




Trying to get a feel for if there are conventions  to enforce in some code
reviews for our hadoop dependent configuration code.  We'd like to follow
hadoopy idioms if possible..


-- 
Jay Vyas
http://jayunit100.blogspot.com


Re: [VOTE] Release Apache Hadoop 2.0.4-alpha

2013-04-19 Thread Jonathan Eagles
+1 (non-binding)

Verified sigs and sums
built on ubuntu
ran some simple jobs on a single-node cluster


On Fri, Apr 19, 2013 at 9:20 AM, Tom White  wrote:

> +1
>
> Checked sigs and checksums, source tag, and built from source.
>
> Cheers,
> Tom
>
> On Fri, Apr 12, 2013 at 2:56 PM, Arun C Murthy 
> wrote:
> > Folks,
> >
> > I've created a release candidate (RC2) for hadoop-2.0.4-alpha that I
> would like to release.
> >
> > The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.0.4-alpha-rc2/
> > The RC tag in svn is here:
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4-alpha-rc2
> >
> > The maven artifacts are available via repository.apache.org.
> >
> > Please try the release and vote; the vote will run for the usual 7 days.
> >
> > thanks,
> > Arun
> >
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
>


Re: [VOTE] Release Apache Hadoop 0.23.7

2013-04-19 Thread Jonathan Eagles
+1 (non-binding)

Verified sigs and sums
built
ran some simple jobs on a single-node cluster


On Thu, Apr 18, 2013 at 3:16 PM, Thomas Graves wrote:

> Thanks everyone for trying 0.23.7 out and voting.
>
> The vote passes with 13 +1s (8 binding and 5 non-binding) and no -1s.
>
> I'll push the release.
>
> Tom
>
>
> On 4/11/13 2:55 PM, "Thomas Graves"  wrote:
>
> >I've created a release candidate (RC0) for hadoop-0.23.7 that I would like
> >to release.
> >
> >This release is a sustaining release with several important bug fixes in
> >it.
> >
> >The RC is available at:
> >http://people.apache.org/~tgraves/hadoop-0.23.7-candidate-0/
> >The RC tag in svn is here:
> >http://svn.apache.org/viewvc/hadoop/common/tags/release-0.23.7-rc0/
> >
> >The maven artifacts are available via repository.apache.org.
> >
> >Please try the release and vote; the vote will run for the usual 7 days.
> >
> >thanks,
> >Tom Graves
> >
>
>


Re: Convention question for using DFSConfigKey constants : are zeros magic?

2013-04-19 Thread Aaron T. Myers
Hi Jay,

On Sat, Apr 20, 2013 at 1:10 AM, Jay Vyas  wrote:

> I recently looked into the HDFS source tree to determine idioms with
> respect to a hairy debate about the threshold between what is and is not a
> magic number, and found that :  And it appears that the number zero is NOT
> considered magic - at least not in the HDFS source code.
>

It's certainly not magic in the Configuration class interpretation of it,
though I think if you surveyed the full source code you'd find that there
won't be much consistency with regard to ad hoc checks in the code for
certain values, like you've identified below.


>
> I found that that DFSConfigKeys.java defines DEFAULT values of zeros for
> some fields, and those defaults result in non-quantitative interpretation
> of the field.
>
> For example:
> dfs.image.transfer.bandwidthPerSec
>
> Is commented like so:
> public static final long DFS_IMAGE_TRANSFER_RATE_DEFAULT = 0;  //no
> throttling
>
> However in the implementation of these defaults, "magic" zeros are used
> without commenting:
> if (transferBandwidth > 0) {
>   throttler = new DataTransferThrottler(transferBandwidth);
> }
>
> 
>
> Seems like the 0 above would be better replaced with
> DFS_IMAGE_TRANSFER_RATE_DEFAULT since the "no throttling" behaviour is
> defined with the constant in the DFSConfigKeys file, and not defined in the
>
> hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java.
>

I don't agree with this. What if we later changed the default to something
greater than 0 in DFSConfigKeys? If the code were comparing against the
value DFS_IMAGE_TRANSFER_RATE_DEFAULT, the check in the code would then be
wrong. The only value for that config that should denote "no throttling" is
0, regardless of what the default is, so the explicit comparison against 0
makes sense to me.


>
>
> 
>
> Trying to get a feel for if there are conventions  to enforce in some code
> reviews for our hadoop dependent configuration code.  We'd like to follow
> hadoopy idioms if possible..
>

I'd say the main conventions you should concern yourself with for this
purpose is config setting naming, e.g. use consistent prefixes within your
own code, use lower case separated by dots.and-dashes, etc.


>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>


Slow region server recoveries due to lease recovery going to stale data node

2013-04-19 Thread Ted Yu
I think the issue would be more appropriate for hdfs-dev@ mailing list.

Putting use@hbase as Bcc.

-- Forwarded message --
From: Varun Sharma 
Date: Fri, Apr 19, 2013 at 1:10 PM
Subject: Re: Slow region server recoveries
To: u...@hbase.apache.org


This is 0.94.3 hbase...


On Fri, Apr 19, 2013 at 1:09 PM, Varun Sharma  wrote:

> Hi Ted,
>
> I had a long offline discussion with nicholas on this. Looks like the last
> block which was still being written too, took an enormous time to recover.
> Here's what happened.
> a) Master split tasks and region servers process them
> b) Region server tries to recover lease for each WAL log - most cases are
> noop since they are already rolled over/finalized
> c) The last file lease recovery takes some time since the crashing server
> was writing to it and had a lease on it - but basically we have the lease
1
> minute after the server was lost
> d) Now we start the recovery for this but we end up hitting the stale data
> node which is puzzling.
>
> It seems that we did not hit the stale datanode when we were trying to
> recover the finalized WAL blocks with trivial lease recovery. However, for
> the final block, we hit the stale datanode. Any clue why this might be
> happening ?
>
> Varun
>
>
> On Fri, Apr 19, 2013 at 10:40 AM, Ted Yu  wrote:
>
>> Can you show snippet from DN log which mentioned UNDER_RECOVERY ?
>>
>> Here is the criteria for stale node checking to kick in (from
>>
>>
https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
>> ):
>>
>> +   * Check if the datanode is in stale state. Here if
>> +   * the namenode has not received heartbeat msg from a
>> +   * datanode for more than staleInterval (default value is
>> +   * {@link
>> DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
>> +   * the datanode will be treated as stale node.
>>
>>
>> On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma 
>> wrote:
>>
>> > Is there a place to upload these logs ?
>> >
>> >
>> > On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma 
>> > wrote:
>> >
>> > > Hi Nicholas,
>> > >
>> > > Attached are the namenode, dn logs (of one of the healthy replicas of
>> the
>> > > WAL block) and the rs logs which got stuch doing the log split.
Action
>> > > begins at 2013-04-19 00:27*.
>> > >
>> > > Also, the rogue block is 5723958680970112840_174056. Its very
>> interesting
>> > > to trace this guy through the HDFS logs (dn and nn).
>> > >
>> > > Btw, do you know what the UNDER_RECOVERY stage is for, in HDFS ? Also
>> > does
>> > > the stale node stuff kick in for that state ?
>> > >
>> > > Thanks
>> > > Varun
>> > >
>> > >
>> > > On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon > > >wrote:
>> > >
>> > >> Thanks for the detailed scenario and analysis. I'm going to have a
>> look.
>> > >> I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com
>> > >> timeouts), could you please send them directly to me?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Nicolas
>> > >>
>> > >>
>> > >> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma 
>> > >> wrote:
>> > >>
>> > >> > Hi Nicholas,
>> > >> >
>> > >> > Here is the failure scenario, I have dug up the logs.
>> > >> >
>> > >> > A machine fails and stops accepting/transmitting traffic. The
>> HMaster
>> > >> > starts the distributed split for 13 tasks. There are 12 region
>> > servers.
>> > >> 12
>> > >> > tasks succeed but the 13th one takes a looong time.
>> > >> >
>> > >> > Zookeeper timeout is set to 30 seconds. Stale node timeout is 20
>> > >> seconds.
>> > >> > Both patches are there.
>> > >> >
>> > >> > a) Machine fails around 27:30
>> > >> > b) Master starts the split around 27:40 and submits the tasks. The
>> one
>> > >> task
>> > >> > which fails seems to be the one which contains the WAL being
>> currently
>> > >> > written to:
>> > >> >
>> > >> > 2013-04-19 00:27:44,325 INFO
>> > >> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting
>> hlog:
>> > >> > hdfs://
>> > >> >
>> > >> >
>> > >>
>> >
>>
ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
>> > >> > ,
>> > >> > length=0
>> > >> >
>> > >> > Basically this region server picks up the task but finds the
>> length of
>> > >> this
>> > >> > file to be 0 and drops. This happens again
>> > >> >
>> > >> > c) Finally another region server picks up the task but it ends up
>> > going
>> > >> to
>> > >> > the bad datanode which should not happen because of the stale node
>> > >> timeout)
>> > >> > Unfortunately it hits the 45 retries and a connect timeout of 20
>> > seconds
>> > >> > every time. This delays recovery significantly. Now I guess
>> reducing #
>> > >> of
>> > >> > retries to 1 is one possibility.
>> > >> > But then the namenode logs are very interesting.
>> > >> >
>> > >> > d) Namenode seems to be in cyclic lease recovery loop until the
>> node
>> > is
>> > >> > marked dead. Th

Meaning of UNDER_RECOVERY blocks

2013-04-19 Thread Varun Sharma
Hi,

I had an instance where a datanode died while writing the block I am using
Hadoop 2.0 patched with HDFS 3703 for stale node detection every 20 seconds.

The block being written to, went into the UNDER_RECOVERY state looking at
the namenode logs and there were several internalRecoverLease() calls
because there were readers on that blcok. I had a couple of questions about
the code;

1) I see that when a block is UNDER_RECOVERY, it is added to recoverBlocks
for each dataNodeDescriptor that holds the block. Then a recoverBlock call
is issued to each primary data node. What does the recoverBlock call do on
a datanode - does it sync the block on that node to other 2 data nodes. In
my case one of the data node is unreachable, what is the behaviour in such
a case ?

2) When a client wants to read a block which is "UNDER_RECOVERY" - do we
continue to suggest all 3 data nodes as replicas for reads or we pick the
one which is marked as primary for the block recovery ?

Thanks


[jira] [Created] (HDFS-4718) TestHDFSCLI regexps reject valid user names

2013-04-19 Thread Andrew Purtell (JIRA)
Andrew Purtell created HDFS-4718:


 Summary: TestHDFSCLI regexps reject valid user names
 Key: HDFS-4718
 URL: https://issues.apache.org/jira/browse/HDFS-4718
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.0.5-beta
Reporter: Andrew Purtell
Priority: Minor
 Attachments: 4718-branch-2.patch

The regular expressions used by TestHDFSCLI reject some valid user names, e.g. 
"ec2-user". On branch-2, user names containing digits will also be rejected. On 
trunk, an attempt was made to address the latter problem but the change may 
contain an unintended typographical error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Meaning of UNDER_RECOVERY blocks

2013-04-19 Thread Varun Sharma
Would be nice if someone could help out with this - it looks like a trivial
question - but seems like some blocks are being lost for us when datanodes
fail...

Varun


On Fri, Apr 19, 2013 at 2:28 PM, Varun Sharma  wrote:

> Hi,
>
> I had an instance where a datanode died while writing the block I am using
> Hadoop 2.0 patched with HDFS 3703 for stale node detection every 20 seconds.
>
> The block being written to, went into the UNDER_RECOVERY state looking at
> the namenode logs and there were several internalRecoverLease() calls
> because there were readers on that blcok. I had a couple of questions about
> the code;
>
> 1) I see that when a block is UNDER_RECOVERY, it is added to recoverBlocks
> for each dataNodeDescriptor that holds the block. Then a recoverBlock call
> is issued to each primary data node. What does the recoverBlock call do on
> a datanode - does it sync the block on that node to other 2 data nodes. In
> my case one of the data node is unreachable, what is the behaviour in such
> a case ?
>
> 2) When a client wants to read a block which is "UNDER_RECOVERY" - do we
> continue to suggest all 3 data nodes as replicas for reads or we pick the
> one which is marked as primary for the block recovery ?
>
> Thanks
>


[jira] [Resolved] (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2013-04-19 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li resolved HDFS-487.
-

Resolution: Duplicate

> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Convention question for using DFSConfigKey constants : are zeros magic?

2013-04-19 Thread Jay Vyas
Thanks ! okay that helps to clarify things.

Okay, so the value in referencing the static comment is that it is
commented in the DFS_KEYS file, and declared as 0.  Having the ==0 in code
defines this default behaviour implicitly, so a change to the code would
make that code inconsistent with the comment in the DFS Keys file.  Having
run into some tricky configuration changes in the past it concerns me a
little... but...

The more generic question is wether or not there is enforcement of naming
conventions or commenting for special values in numeric configuration
parameters: I'm interpretting your answer as "No"...?

Thats fine.. but  just clarifying :) ?

On Fri, Apr 19, 2013 at 2:23 PM, Aaron T. Myers  wrote:

> Hi Jay,
>
> On Sat, Apr 20, 2013 at 1:10 AM, Jay Vyas  wrote:
>
> > I recently looked into the HDFS source tree to determine idioms with
> > respect to a hairy debate about the threshold between what is and is not
> a
> > magic number, and found that :  And it appears that the number zero is
> NOT
> > considered magic - at least not in the HDFS source code.
> >
>
> It's certainly not magic in the Configuration class interpretation of it,
> though I think if you surveyed the full source code you'd find that there
> won't be much consistency with regard to ad hoc checks in the code for
> certain values, like you've identified below.
>
>
> >
> > I found that that DFSConfigKeys.java defines DEFAULT values of zeros for
> > some fields, and those defaults result in non-quantitative interpretation
> > of the field.
> >
> > For example:
> > dfs.image.transfer.bandwidthPerSec
> >
> > Is commented like so:
> > public static final long DFS_IMAGE_TRANSFER_RATE_DEFAULT = 0;  //no
> > throttling
> >
> > However in the implementation of these defaults, "magic" zeros are used
> > without commenting:
> > if (transferBandwidth > 0) {
> >   throttler = new DataTransferThrottler(transferBandwidth);
> > }
> >
> > 
> >
> > Seems like the 0 above would be better replaced with
> > DFS_IMAGE_TRANSFER_RATE_DEFAULT since the "no throttling" behaviour is
> > defined with the constant in the DFSConfigKeys file, and not defined in
> the
> >
> >
> hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java.
> >
>
> I don't agree with this. What if we later changed the default to something
> greater than 0 in DFSConfigKeys? If the code were comparing against the
> value DFS_IMAGE_TRANSFER_RATE_DEFAULT, the check in the code would then be
> wrong. The only value for that config that should denote "no throttling" is
> 0, regardless of what the default is, so the explicit comparison against 0
> makes sense to me.
>
>
> >
> >
> > 
> >
> > Trying to get a feel for if there are conventions  to enforce in some
> code
> > reviews for our hadoop dependent configuration code.  We'd like to follow
> > hadoopy idioms if possible..
> >
>
> I'd say the main conventions you should concern yourself with for this
> purpose is config setting naming, e.g. use consistent prefixes within your
> own code, use lower case separated by dots.and-dashes, etc.
>
>
> >
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> >
>



-- 
Jay Vyas
http://jayunit100.blogspot.com


Re: [VOTE] Release Apache Hadoop 2.0.4-alpha

2013-04-19 Thread Jason Lowe

+1 (binding)

- verified signatures and checksums
- installed single-node cluster from binaries and ran sample jobs
- built and installed single-node cluster from source and ran sample jobs

Jason

On 04/12/2013 04:56 PM, Arun C Murthy wrote:

Folks,

I've created a release candidate (RC2) for hadoop-2.0.4-alpha that I would like 
to release.

The RC is available at: 
http://people.apache.org/~acmurthy/hadoop-2.0.4-alpha-rc2/
The RC tag in svn is here: 
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4-alpha-rc2

The maven artifacts are available via repository.apache.org.

Please try the release and vote; the vote will run for the usual 7 days.

thanks,
Arun


--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/