[jira] [Created] (HDFS-4939) Retain old edits log, don't retain all minimum required logs

2013-06-25 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-4939:
-

 Summary: Retain old edits log, don't retain all minimum required 
logs
 Key: HDFS-4939
 URL: https://issues.apache.org/jira/browse/HDFS-4939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Reporter: Fengdong Yu
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4938) Reduce redundant information in edit logs and image files

2013-06-25 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-4938:
---

 Summary: Reduce redundant information in edit logs and image files
 Key: HDFS-4938
 URL: https://issues.apache.org/jira/browse/HDFS-4938
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


Generation stamps are logged as edits and in image files on checkpoint. This is 
potentially redundant as the generation stamp is also logged with block 
creation/append. Jira is to investigate and remove any redundant fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4937) ReplicationMonitor can infinite-loop in BlockPlacementPolicyDefault#chooseRandom()

2013-06-25 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-4937:


 Summary: ReplicationMonitor can infinite-loop in 
BlockPlacementPolicyDefault#chooseRandom()
 Key: HDFS-4937
 URL: https://issues.apache.org/jira/browse/HDFS-4937
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.23.8, 2.0.4-alpha
Reporter: Kihwal Lee


When a large number of nodes are removed by refreshing node lists, the network 
topology is updated. If the refresh happens at the right moment, the 
replication monitor thread may stuck in the while loop of {{chooseRandom()}}. 
This is because the old cluster size is used in the terminal condition check of 
the loop. This usually happens when a block with a high replication factor is 
being processed. Since replicas/rack is calculated beforehand, no node choice 
may satisfy the goodness criteria if refreshing removed racks. 

All nodes will end up in the excluded list, but the size will still be less 
than the previously recorded cluster size, so it will loop infinitely. This has 
been seen in a production environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A question for txid

2013-06-25 Thread Azuryy Yu
Thanks Harsh,Todd.

After 200 million years, spacemen manage the earth, they also know Hadoop,
but they cannot restart it, after a hard debug they find the txid has been
overflowed for many years.

--Send from my Sony mobile.
On Jun 25, 2013 10:52 PM, "Todd Lipcon"  wrote:

> I did some back of the envelope math when implementing txids, and
> determined that overflow is not ever going to happen... A "busy" namenode
> does 1000 write transactions/second (2^10). MAX_LONG is 2^63. So, we can
> run for 2^63 seconds. A year is about 2^25 seconds. So, at 1k tps, you can
> run your namenode for 2^(63-10-25) = 268 million years.
>
> Hadoop is great software and I'm sure it will be around for years to come,
> but if it's still running in 268 million years, that will be a pretty
> depressing rate of technological progress!
>
> -Todd
>
> On Tue, Jun 25, 2013 at 6:14 AM, Harsh J  wrote:
>
> > Yes, it logically can if there have been as many transactions (its a
> > very very large number to reach though).
> >
> > Long.MAX_VALUE is (2^63 - 1) or 9223372036854775807.
> >
> > I hacked up my local NN's txids manually to go very large (close to
> > max) and decided to try out if this causes any harm. I basically
> > bumped up the freshly formatted starting txid to 9223372036854775805
> > (and ensured image references the same):
> >
> > ➜  current  ls
> > VERSION
> > fsimage_9223372036854775805.md5
> > fsimage_9223372036854775805
> > seen_txid
> > ➜  current  cat seen_txid
> > 9223372036854775805
> >
> > NameNode started up as expected.
> >
> > 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded
> > in 0 seconds.
> > 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid
> > 9223372036854775805 from
> > /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
> > 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at
> > 9223372036854775806
> >
> > I could create a bunch of files and do regular ops (counting to much
> > after the long max increments). I created over 100 files, just to make
> > it go well over the Long.MAX_VALUE.
> >
> > Quitting NameNode and restarting fails though, with the following error:
> >
> > 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
> > java.io.IOException: Gap in transactions. Expected to be able to read
> > up until at least txid 9223372036854775806 but unable to find any edit
> > logs containing txid -9223372036854775808
> >
> > So it looks like it cannot currently handle an overflow.
> >
> > I've filed https://issues.apache.org/jira/browse/HDFS-4936 to discuss
> > this. I don't think this is of immediate concern though, so we should
> > be able to address it in future (unless there's parts of the code
> > which already are preventing reaching this number in the first place -
> > please do correct me if there is such a part).
> >
> > On Tue, Jun 25, 2013 at 3:09 PM, Azuryy Yu  wrote:
> > > Hi dear All,
> > >
> > > It's long type for the txid currently,
> > >
> > > FSImage.java:
> > >
> > > boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
> > > throws IOException{
> > >
> > >   editLog.setNextTxId(lastAppliedTxId + 1L);
> > > }
> > >
> > > Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?
> >
> >
> >
> > --
> > Harsh J
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


[jira] [Resolved] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE

2013-06-25 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4936.
---

Resolution: Not A Problem

> Handle overflow condition for txid going over Long.MAX_VALUE
> 
>
> Key: HDFS-4936
> URL: https://issues.apache.org/jira/browse/HDFS-4936
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Priority: Minor
>
> Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on 
> mailing lists).
> I hacked up my local NN's txids manually to go very large (close to max) and 
> decided to try out if this causes any harm. I basically bumped up the freshly 
> formatted files' starting txid to 9223372036854775805 (and ensured image 
> references the same by hex-editing it):
> {code}
> ➜  current  ls
> VERSION
> fsimage_9223372036854775805.md5
> fsimage_9223372036854775805
> seen_txid
> ➜  current  cat seen_txid
> 9223372036854775805
> {code}
> NameNode started up as expected.
> {code}
> 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
> seconds.
> 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
> 9223372036854775805 from 
> /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
> 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
> 9223372036854775806
> {code}
> I could create a bunch of files and do regular ops (counting to much after 
> the long max increments). I created over 10 files, just to make it go well 
> over the Long.MAX_VALUE.
> Quitting NameNode and restarting fails though, with the following error:
> {code}
> 13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
> segments in 
> /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
> 13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
> /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
>  -> 
> /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
> 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
> java.io.IOException: Gap in transactions. Expected to be able to read up 
> until at least txid 9223372036854775806 but unable to find any edit logs 
> containing txid -9223372036854775808
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:609)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:590)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
> {code}
> Looks like we also lose some edits when we restart, as noted by the finalized 
> edits filename:
> {code}
> VERSION
> edits_9223372036854775806-9223372036854775807
> fsimage_9223372036854775805
> fsimage_9223372036854775805.md5
> seen_txid
> {code}
> It seems like we won't be able to handle the case where txid overflows. Its a 
> very very large number so that's not an immediate concern but seemed worthy 
> of a report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A question for txid

2013-06-25 Thread Todd Lipcon
I did some back of the envelope math when implementing txids, and
determined that overflow is not ever going to happen... A "busy" namenode
does 1000 write transactions/second (2^10). MAX_LONG is 2^63. So, we can
run for 2^63 seconds. A year is about 2^25 seconds. So, at 1k tps, you can
run your namenode for 2^(63-10-25) = 268 million years.

Hadoop is great software and I'm sure it will be around for years to come,
but if it's still running in 268 million years, that will be a pretty
depressing rate of technological progress!

-Todd

On Tue, Jun 25, 2013 at 6:14 AM, Harsh J  wrote:

> Yes, it logically can if there have been as many transactions (its a
> very very large number to reach though).
>
> Long.MAX_VALUE is (2^63 - 1) or 9223372036854775807.
>
> I hacked up my local NN's txids manually to go very large (close to
> max) and decided to try out if this causes any harm. I basically
> bumped up the freshly formatted starting txid to 9223372036854775805
> (and ensured image references the same):
>
> ➜  current  ls
> VERSION
> fsimage_9223372036854775805.md5
> fsimage_9223372036854775805
> seen_txid
> ➜  current  cat seen_txid
> 9223372036854775805
>
> NameNode started up as expected.
>
> 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded
> in 0 seconds.
> 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid
> 9223372036854775805 from
> /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
> 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at
> 9223372036854775806
>
> I could create a bunch of files and do regular ops (counting to much
> after the long max increments). I created over 100 files, just to make
> it go well over the Long.MAX_VALUE.
>
> Quitting NameNode and restarting fails though, with the following error:
>
> 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
> java.io.IOException: Gap in transactions. Expected to be able to read
> up until at least txid 9223372036854775806 but unable to find any edit
> logs containing txid -9223372036854775808
>
> So it looks like it cannot currently handle an overflow.
>
> I've filed https://issues.apache.org/jira/browse/HDFS-4936 to discuss
> this. I don't think this is of immediate concern though, so we should
> be able to address it in future (unless there's parts of the code
> which already are preventing reaching this number in the first place -
> please do correct me if there is such a part).
>
> On Tue, Jun 25, 2013 at 3:09 PM, Azuryy Yu  wrote:
> > Hi dear All,
> >
> > It's long type for the txid currently,
> >
> > FSImage.java:
> >
> > boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
> > throws IOException{
> >
> >   editLog.setNextTxId(lastAppliedTxId + 1L);
> > }
> >
> > Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?
>
>
>
> --
> Harsh J
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: A question for txid

2013-06-25 Thread Harsh J
Yes, it logically can if there have been as many transactions (its a
very very large number to reach though).

Long.MAX_VALUE is (2^63 - 1) or 9223372036854775807.

I hacked up my local NN's txids manually to go very large (close to
max) and decided to try out if this causes any harm. I basically
bumped up the freshly formatted starting txid to 9223372036854775805
(and ensured image references the same):

➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805

NameNode started up as expected.

13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded
in 0 seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid
9223372036854775805 from
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at
9223372036854775806

I could create a bunch of files and do regular ops (counting to much
after the long max increments). I created over 100 files, just to make
it go well over the Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: Gap in transactions. Expected to be able to read
up until at least txid 9223372036854775806 but unable to find any edit
logs containing txid -9223372036854775808

So it looks like it cannot currently handle an overflow.

I've filed https://issues.apache.org/jira/browse/HDFS-4936 to discuss
this. I don't think this is of immediate concern though, so we should
be able to address it in future (unless there's parts of the code
which already are preventing reaching this number in the first place -
please do correct me if there is such a part).

On Tue, Jun 25, 2013 at 3:09 PM, Azuryy Yu  wrote:
> Hi dear All,
>
> It's long type for the txid currently,
>
> FSImage.java:
>
> boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
> throws IOException{
>
>   editLog.setNextTxId(lastAppliedTxId + 1L);
> }
>
> Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?



-- 
Harsh J


[jira] [Created] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE

2013-06-25 Thread Harsh J (JIRA)
Harsh J created HDFS-4936:
-

 Summary: Handle overflow condition for txid going over 
Long.MAX_VALUE
 Key: HDFS-4936
 URL: https://issues.apache.org/jira/browse/HDFS-4936
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on 
mailing lists).

I hacked up my local NN's txids manually to go very large (close to max) and 
decided to try out if this causes any harm. I basically bumped up the freshly 
formatted files' starting txid to 9223372036854775805 (and ensured image 
references the same by hex-editing it):

{code}
➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805
{code}

NameNode started up as expected.

{code}
13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
9223372036854775805 from 
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
9223372036854775806
{code}

I could create a bunch of files and do regular ops (counting to much after the 
long max increments). I created over 10 files, just to make it go well over the 
Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

{code}
13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
segments in 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
 -> 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: Gap in transactions. Expected to be able to read up until 
at least txid 9223372036854775806 but unable to find any edit logs containing 
txid -9223372036854775808
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:609)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:590)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
{code}

Looks like we also lose some edits when we restart, as noted by the finalized 
edits filename:

{code}
VERSION
edits_9223372036854775806-9223372036854775807
fsimage_9223372036854775805
fsimage_9223372036854775805.md5
seen_txid
{code}

It seems like we won't be able to handle the case where txid overflows. Its a 
very very large number so that's not an immediate concern but seemed worthy of 
a report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hadoop-Hdfs-trunk #1441

2013-06-25 Thread Apache Jenkins Server
See 

Changes:

[cnauroth] HDFS-4927. CreateEditsLog creates inodes with an invalid inode ID, 
which then cannot be loaded by a namenode. Contributed by Chris Nauroth.

[cmccabe] HADOOP-9355.  Abstract symlink tests to use either FileContext or 
FileSystem.  (Andrew Wang via Colin Patrick McCabe)

[tucu] HADOOP-9661. Allow metrics sources to be extended. (sandyr via tucu)

[tucu] YARN-736. Add a multi-resource fair sharing metric. (sandyr via tucu)

[cmccabe] HADOOP-9439.  JniBasedUnixGroupsMapping: fix some crash bugs (Colin 
Patrick McCabe)

--
[...truncated 11041 lines...]
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.423 sec
Running org.apache.hadoop.hdfs.server.blockmanagement.TestBlockInfo
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.151 sec
Running org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.198 sec
Running org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.059 sec
Running org.apache.hadoop.hdfs.server.blockmanagement.TestCorruptReplicaInfo
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.226 sec
Running org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.219 sec
Running 
org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlockQueues
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.165 sec
Running org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.084 sec
Running org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.671 sec
Running 
org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithNodeGroup
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.858 sec
Running org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeDescriptor
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.146 sec
Running org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.472 sec
Running org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy
Tests run: 20, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.171 sec
Running org.apache.hadoop.hdfs.TestHftpURLTimeouts
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.945 sec
Running org.apache.hadoop.hdfs.TestClose
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.036 sec
Running org.apache.hadoop.hdfs.TestDatanodeDeath
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 143.284 sec
Running org.apache.hadoop.hdfs.TestFileAppend
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.385 sec
Running org.apache.hadoop.hdfs.TestFileCreationEmpty
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.192 sec
Running org.apache.hadoop.hdfs.TestDatanodeBlockScanner
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.054 sec
Running org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 117.537 sec
Running org.apache.hadoop.hdfs.TestDeprecatedKeys
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.28 sec
Running org.apache.hadoop.hdfs.TestFileAppend2
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.853 sec
Running org.apache.hadoop.hdfs.TestDFSUpgrade
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.13 sec
Running org.apache.hadoop.hdfs.TestFileCorruption
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.641 sec
Running org.apache.hadoop.hdfs.TestFileAppendRestart
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.191 sec
Running org.apache.hadoop.hdfs.TestHDFSFileSystemContract
Tests run: 44, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 46.534 sec
Running org.apache.hadoop.hdfs.TestQuota
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.973 sec
Running org.apache.hadoop.hdfs.TestDFSAddressConfig
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.855 sec
Running org.apache.hadoop.hdfs.TestAppendDifferentChecksum
Tests run: 3, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 8.374 sec
Running org.apache.hadoop.hdfs.TestParallelUnixDomainRead
Tests run: 4, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 0.163 sec
Running org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.802 sec
Running org.apache.hadoop.hdfs.tools.TestGetConf
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.696 sec
Running 
org.apache

Hadoop-Hdfs-trunk - Build # 1441 - Still Failing

2013-06-25 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1441/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 11234 lines...]
[WARNING] Failed to retrieve plugin descriptor for 
org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin 
org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could not be 
resolved: Failed to read artifact descriptor for 
org.eclipse.m2e:lifecycle-mapping:jar:1.0.0
[INFO] 
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ hadoop-hdfs-project 
---
[INFO] Deleting 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/target
[INFO] 
[INFO] --- maven-antrun-plugin:1.6:run (create-testdirs) @ hadoop-hdfs-project 
---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/target/test-dir
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:test-jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.0:enforce (dist-enforce) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ 
hadoop-hdfs-project ---
[INFO] Not executing Javadoc as the project is not a Java classpath-capable 
package
[INFO] 
[INFO] --- maven-checkstyle-plugin:2.6:checkstyle (default-cli) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- findbugs-maven-plugin:2.3.2:findbugs (default-cli) @ 
hadoop-hdfs-project ---
[INFO] ** FindBugsMojo execute ***
[INFO] canGenerate is false
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop HDFS  FAILURE 
[1:30:44.464s]
[INFO] Apache Hadoop HttpFS .. SKIPPED
[INFO] Apache Hadoop HDFS BookKeeper Journal . SKIPPED
[INFO] Apache Hadoop HDFS Project  SUCCESS [2.667s]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 1:30:47.952s
[INFO] Finished at: Tue Jun 25 13:11:09 UTC 2013
[INFO] Final Memory: 39M/393M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.12.3:test (default-test) on 
project hadoop-hdfs: ExecutionException; nested exception is 
java.util.concurrent.ExecutionException: java.lang.RuntimeException: The forked 
VM terminated without saying properly goodbye. VM crash or System.exit called ? 
-> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
Build step 'Execute shell' marked build as failure
Archiving artifacts
Updating HADOOP-9661
Updating HDFS-4927
Updating HADOOP-9355
Updating HADOOP-9439
Updating YARN-736
Sending e-mails to: hdfs-dev@hadoop.apache.org
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Created] (HDFS-4934) add symlink support to WebHDFS server side

2013-06-25 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HDFS-4934:


 Summary: add symlink support to WebHDFS server side
 Key: HDFS-4934
 URL: https://issues.apache.org/jira/browse/HDFS-4934
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.2.0
 Environment: followup on HADOOP-8040
Reporter: Alejandro Abdelnur


follow up on HADOOP-8040

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4935) add symlink support to HttpFS server side

2013-06-25 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HDFS-4935:


 Summary: add symlink support to HttpFS server side
 Key: HDFS-4935
 URL: https://issues.apache.org/jira/browse/HDFS-4935
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: followup on HADOOP-8040
Reporter: Alejandro Abdelnur


follow up on HADOOP-8040

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4933) add symlink support to WebHDFS to HTTP REST API & client filesystem

2013-06-25 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HDFS-4933:


 Summary: add symlink support to WebHDFS to HTTP REST API & client 
filesystem
 Key: HDFS-4933
 URL: https://issues.apache.org/jira/browse/HDFS-4933
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur


follow up on HADOOP-8040

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build became unstable: Hadoop-Hdfs-0.23-Build #649

2013-06-25 Thread Apache Jenkins Server
See 



Hadoop-Hdfs-0.23-Build - Build # 649 - Unstable

2013-06-25 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/649/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 11842 lines...]
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:test-jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ 
hadoop-hdfs-project ---
[INFO] Not executing Javadoc as the project is not a Java classpath-capable 
package
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ 
hadoop-hdfs-project ---
[INFO] Installing 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/pom.xml
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-hdfs-project/0.23.9-SNAPSHOT/hadoop-hdfs-project-0.23.9-SNAPSHOT.pom
[INFO] 
[INFO] --- maven-antrun-plugin:1.6:run (create-testdirs) @ hadoop-hdfs-project 
---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-dependency-plugin:2.1:build-classpath (build-classpath) @ 
hadoop-hdfs-project ---
[INFO] No dependencies found.
[INFO] Skipped writing classpath file 
'/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/target/classes/mrapp-generated-classpath'.
  No changes found.
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:test-jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ 
hadoop-hdfs-project ---
[INFO] Not executing Javadoc as the project is not a Java classpath-capable 
package
[INFO] 
[INFO] --- maven-checkstyle-plugin:2.6:checkstyle (default-cli) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- findbugs-maven-plugin:2.3.2:findbugs (default-cli) @ 
hadoop-hdfs-project ---
[INFO] ** FindBugsMojo execute ***
[INFO] canGenerate is false
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop HDFS  SUCCESS [4:53.650s]
[INFO] Apache Hadoop HttpFS .. SUCCESS [51.089s]
[INFO] Apache Hadoop HDFS Project  SUCCESS [0.059s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 5:45.406s
[INFO] Finished at: Tue Jun 25 11:39:27 UTC 2013
[INFO] Final Memory: 52M/747M
[INFO] 
+ /home/jenkins/tools/maven/latest/bin/mvn test 
-Dmaven.test.failure.ignore=true -Pclover 
-DcloverLicenseLocation=/home/jenkins/tools/clover/latest/lib/clover.license
Archiving artifacts
Recording test results
Build step 'Publish JUnit test result report' changed build result to UNSTABLE
Publishing Javadoc
Recording fingerprints
Sending e-mails to: hdfs-dev@hadoop.apache.org
Email was triggered for: Unstable
Sending email for trigger: Unstable



###
## FAILED TESTS (if any) 
##
3 tests failed.
REGRESSION:  
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1

Error Message:
Timed out waiting for corrupt replicas. Waiting for 1, but only found 0

Stack Trace:
java.util.concurrent.TimeoutException: Timed out waiting for corrupt replicas. 
Waiting for 1, but only found 0
at 
org.apache.hadoop.hdfs.DFSTestUtil.waitCorruptReplicas(DFSTestUtil.java:330)
at 
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.blockCorruptionRecoveryPolicy(TestDatanodeBlockScanner.java:288)
at 
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.__CLR3_0_2wadu2t10id(TestDatanodeBlockScanner.java:236)
at 
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1(TestDatanodeBlockScanner.java:233)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at juni

[jira] [Created] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes

2013-06-25 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-4932:
-

 Summary: Avoid a long line on the name node webUI if we have more 
Journal nodes
 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta


If we have more Journal nodes, It shows a long line on the name node webUI, 
this patch wrapped line. just show three journal nodes on each line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


A question for txid

2013-06-25 Thread Azuryy Yu
Hi dear All,

It's long type for the txid currently,

FSImage.java:

boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
throws IOException{

  editLog.setNextTxId(lastAppliedTxId + 1L);
}

Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?