date:20110512


[ 
https://issues.apache.org/jira/browse/HDFS-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032270#comment-13032270
 ] 

Suresh Srinivas commented on HDFS-1919:
---

I do not see any restarts failing. Perhaps you did not clean build and the 
static LV was not updated in some classes.

However, I do see that bumping up the layout version for 203, 22 and trunk 
(from HDFS-1842 and HDFS-1824) has caused problems with some of the version 
checks for new fsimage scheme of loading and checksum of edits etc. I will 
change the title of this bug and fix them.

 Upgrade to federated namespace fails
 

 Key: HDFS-1919
 URL: https://issues.apache.org/jira/browse/HDFS-1919
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Suresh Srinivas
Priority: Blocker
 Fix For: 0.23.0

 Attachments: hdfs-1919.txt


 I formatted a namenode running off 0.22 branch, and trying to start it on 
 trunk yields:
 org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
 /tmp/name1 is in an inconsistent state: file VERSION has clusterID mising.
 It looks like 0.22 has LAYOUT_VERSION -33, but trunk has 
 LAST_PRE_FEDERATION_LAYOUT_VERSION = -30, which is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-1925) SafeModeInfo should use DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT instead of 0.95

2011-05-12 Thread Konstantin Shvachko (JIRA)

SafeModeInfo should use DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT instead of 
0.95
---

 Key: HDFS-1925
 URL: https://issues.apache.org/jira/browse/HDFS-1925
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
 Fix For: 0.22.0


{{SafeMode()}} constructor has 0.95f default threshold hard-coded. This should 
be replaced by the constant {{DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT}}, 
which is correctly set to 0.999f.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1918) DataXceiver double logs every IOE out of readBlock


[ 
https://issues.apache.org/jira/browse/HDFS-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032280#comment-13032280
 ] 

Todd Lipcon commented on HDFS-1918:
---

Hmm, I half agree with your assessment. But, I think this patch will change the 
metrics behavior here, no?

Maybe we still need to catch certain classes of exception (socket timeout and 
connection reset by peer) and treat them as successful block reads as far as 
metrics are concerned? (And probably log at DEBUG level instead of WARN)?

 DataXceiver double logs every IOE out of readBlock
 --

 Key: HDFS-1918
 URL: https://issues.apache.org/jira/browse/HDFS-1918
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.20.2
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Trivial
 Fix For: 0.22.0

 Attachments: HDFS-1918.patch


 DataXceiver will log an IOE twice because opReadBlock() will catch it, log a 
 WARN, then throw it again only to be caught in run() as a Throwable and 
 logged as an ERROR. As far as I can tell all the information is the same in 
 both messages.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1925) SafeModeInfo should use DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT instead of 0.95


 [ 
https://issues.apache.org/jira/browse/HDFS-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-1925:
-

Labels: newbie  (was: )

 SafeModeInfo should use DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT instead 
 of 0.95
 ---

 Key: HDFS-1925
 URL: https://issues.apache.org/jira/browse/HDFS-1925
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
  Labels: newbie
 Fix For: 0.22.0


 {{SafeMode()}} constructor has 0.95f default threshold hard-coded. This 
 should be replaced by the constant 
 {{DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT}}, which is correctly set to 
 0.999f.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-671) Documentation change for updated configuration keys.

2011-05-12 Thread Dmitriy V. Ryaboy (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032285#comment-13032285
]

Dmitriy V. Ryaboy commented on HDFS-671:

Guys,

While working on getting Pig to play with 0.22, I was getting 35 errors for one
of the test cases (TestGrunt). In the process of debugging, I noticed a lot of
deprecation warnings concerning fs.default.name and changed the references
across the Pig codebase to use fs.defaultFS, just to silence the noise. No
other code changes were made between compilations. Suddenly the errors dropped
to 13.

I believe the cause is that Pig plays pretty loose with switching between Conf
and using Properties directly, and was using fs.default.name all over the
place. Seems like the deprecation warning and docs should user stronger
language -- using the old string anywhere but in the xml file is likely to
cause problems, as illustrated by this.

Documentation change for updated configuration keys.

Key: HDFS-671
URL: https://issues.apache.org/jira/browse/HDFS-671
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Tom White
Priority: Blocker
Fix For: 0.22.0

Attachments: HDFS-671.patch

HDFS-531, HADOOP-6233 and HDFS-631 have resulted in changes in several
config keys. The hadoop documentation needs to be updated to reflect those
changes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded

[
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032311#comment-13032311
]

Hadoop QA commented on HDFS-1332:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12478925/HDFS-1332.patch
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these core unit tests:
org.apache.hadoop.cli.TestHDFSCLI

org.apache.hadoop.hdfs.server.namenode.TestOverReplicatedBlocks
org.apache.hadoop.hdfs.TestDFSShell
org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
org.apache.hadoop.hdfs.TestFileConcurrentReader
org.apache.hadoop.hdfs.TestHDFSTrash
org.apache.hadoop.tools.TestJMXGet

+1 contrib tests. The patch passed contrib unit tests.

+1 system test framework. The patch passed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/493//testReport/
Findbugs warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/493//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/493//console

This message is automatically generated.

When unable to place replicas, BlockPlacementPolicy should log reasons nodes
were excluded
--

Key: HDFS-1332
URL: https://issues.apache.org/jira/browse/HDFS-1332
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Reporter: Todd Lipcon
Assignee: Ted Yu
Priority: Minor
Labels: newbie
Fix For: 0.23.0

Attachments: HDFS-1332.patch

Whenever the block placement policy determines that a node is not a good
target it could add the reason for exclusion to a list, and then when we log
Not able to place enough replicas we could say why each node was refused.
This would help new users who are having issues on pseudo-distributed (eg
because their data dir is on /tmp and /tmp is full). Right now it's very
difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

[
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032316#comment-13032316
]

Hadoop QA commented on HDFS-1505:
-

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12478913/hdfs-1505-trunk.1.patch
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these core unit tests:
org.apache.hadoop.cli.TestHDFSCLI
org.apache.hadoop.hdfs.TestDFSShell
org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
org.apache.hadoop.hdfs.TestFileConcurrentReader
org.apache.hadoop.tools.TestJMXGet

+1 contrib tests. The patch passed contrib unit tests.

+1 system test framework. The patch passed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/492//testReport/
Findbugs warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/492//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/492//console

This message is automatically generated.

saveNamespace appears to succeed even if all directories fail to save
-

Key: HDFS-1505
URL: https://issues.apache.org/jira/browse/HDFS-1505
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
Priority: Blocker
Fix For: 0.22.0

Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch,
hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch

After HDFS-1071, saveNamespace now appears to succeed even if all of the
individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

[
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032388#comment-13032388
]

Hadoop QA commented on HDFS-1505:
-

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12478913/hdfs-1505-trunk.1.patch
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 contrib tests. The patch passed contrib unit tests.

+1 system test framework. The patch passed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/495//testReport/
Findbugs warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/495//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/495//console

This message is automatically generated.

saveNamespace appears to succeed even if all directories fail to save
-

Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch,
hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch

After HDFS-1071, saveNamespace now appears to succeed even if all of the
individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded

[
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032400#comment-13032400
]

Hadoop QA commented on HDFS-1332:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12478925/HDFS-1332.patch
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these core unit tests:
org.apache.hadoop.cli.TestHDFSCLI
org.apache.hadoop.hdfs.server.namenode.TestNodeCount
org.apache.hadoop.hdfs.TestDFSShell
org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
org.apache.hadoop.hdfs.TestFileConcurrentReader
org.apache.hadoop.hdfs.TestHDFSTrash
org.apache.hadoop.tools.TestJMXGet

+1 contrib tests. The patch passed contrib unit tests.

+1 system test framework. The patch passed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/496//testReport/
Findbugs warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/496//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/496//console

This message is automatically generated.

When unable to place replicas, BlockPlacementPolicy should log reasons nodes
were excluded
--

Attachments: HDFS-1332.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface

Remove references to StorageDirectory from JournalManager interface
---

 Key: HDFS-1926
 URL: https://issues.apache.org/jira/browse/HDFS-1926
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface

[
https://issues.apache.org/jira/browse/HDFS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ivan Kelly updated HDFS-1926:
-

Description:
The JournalManager interface introduced by HDFS-1799 has a getStorageDirectory
method which is out of place in a generic interface. This JIRA removed that
call by refactoring the error handling for FSEditLog. Each
EditLogFileOutputStream is now a NNStorageListener and listens for error on
it's containing StorageDirectory. If an error occurs from FSImage, the stream
will be aborted. If the error occurs in FSEditLog, the stream will be aborted
and NNStorage will be notified that the StorageDirectory is no longer valid.

Remove references to StorageDirectory from JournalManager interface
---

Key: HDFS-1926
URL: https://issues.apache.org/jira/browse/HDFS-1926
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly

The JournalManager interface introduced by HDFS-1799 has a
getStorageDirectory method which is out of place in a generic interface. This
JIRA removed that call by refactoring the error handling for FSEditLog. Each
EditLogFileOutputStream is now a NNStorageListener and listens for error on
it's containing StorageDirectory. If an error occurs from FSImage, the stream
will be aborted. If the error occurs in FSEditLog, the stream will be aborted
and NNStorage will be notified that the StorageDirectory is no longer valid.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface


 [ 
https://issues.apache.org/jira/browse/HDFS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1926:
-

Attachment: HDFS-1926.diff

 Remove references to StorageDirectory from JournalManager interface
 ---

 Key: HDFS-1926
 URL: https://issues.apache.org/jira/browse/HDFS-1926
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Attachments: HDFS-1926.diff


 The JournalManager interface introduced by HDFS-1799 has a 
 getStorageDirectory method which is out of place in a generic interface. This 
 JIRA removed that call by refactoring the error handling for FSEditLog. Each 
 EditLogFileOutputStream is now a NNStorageListener and listens for error on 
 it's containing StorageDirectory. If an error occurs from FSImage, the stream 
 will be aborted. If the error occurs in FSEditLog, the stream will be aborted 
 and NNStorage will be notified that the StorageDirectory is no longer valid.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-1927) audit logs could ignore certain xsactions and also could contain ip=null

2011-05-12 Thread John George (JIRA)

audit logs could ignore certain xsactions and also could contain ip=null
--

 Key: HDFS-1927
 URL: https://issues.apache.org/jira/browse/HDFS-1927
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.2, 0.23.0
Reporter: John George
Assignee: John George


Namenode audit logs could be ignoring certain transactions that are 
successfully completed. This is because it check if the RemoteIP is null to 
decide if a transaction is remote or not. In certain cases, RemoteIP could 
return null but the xsaction could still be remote. An example is a case 
where a client gets killed while in the middle of the transaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface

[
https://issues.apache.org/jira/browse/HDFS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032427#comment-13032427
]

Ivan Kelly commented on HDFS-1926:
--

One addendum is that I have temporarily put the code to format edits
directories in FSImage#formatOccurred. I did this because the NNStorageListener
is not on the streams and these are not created at the time of a format, so
formatting would not occur. I assume that HDFS-1073 will get rid of these style
of format anyhow, so once sequential editlog filenames are implemented, this
can be directly deleted. At this stage NNStorageListener could be reevaluated
as a lot of it's usefulness will no longer be needed.

Remove references to StorageDirectory from JournalManager interface
---

Key: HDFS-1926
URL: https://issues.apache.org/jira/browse/HDFS-1926
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Attachments: HDFS-1926.diff

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface


 [ 
https://issues.apache.org/jira/browse/HDFS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1926:
-

Status: Patch Available  (was: Open)

 Remove references to StorageDirectory from JournalManager interface
 ---

 Key: HDFS-1926
 URL: https://issues.apache.org/jira/browse/HDFS-1926
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Attachments: HDFS-1926.diff


 The JournalManager interface introduced by HDFS-1799 has a 
 getStorageDirectory method which is out of place in a generic interface. This 
 JIRA removed that call by refactoring the error handling for FSEditLog. Each 
 EditLogFileOutputStream is now a NNStorageListener and listens for error on 
 it's containing StorageDirectory. If an error occurs from FSImage, the stream 
 will be aborted. If the error occurs in FSEditLog, the stream will be aborted 
 and NNStorage will be notified that the StorageDirectory is no longer valid.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HDFS-1787) Not enough xcievers error should propagate to client


 [ 
https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-1787 started by Jonathan Hsieh.

 Not enough xcievers error should propagate to client
 --

 Key: HDFS-1787
 URL: https://issues.apache.org/jira/browse/HDFS-1787
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Jonathan Hsieh
  Labels: newbie

 We find that users often run into the default transceiver limits in the DN. 
 Putting aside the inherent issues with xceiver threads, it would be nice if 
 the xceiver limit exceeded error propagated to the client. Currently, 
 clients simply see an EOFException which is hard to interpret, and have to go 
 slogging through DN logs to find the underlying issue.
 The data transfer protocol should be extended to either have a special error 
 code for not enough xceivers or should have some error code for generic 
 errors with which a string can be attached and propagated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface

[
https://issues.apache.org/jira/browse/HDFS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032439#comment-13032439
]

Hadoop QA commented on HDFS-1926:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12478972/HDFS-1926.diff
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 2511 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/498//console

This message is automatically generated.

Remove references to StorageDirectory from JournalManager interface
---

Key: HDFS-1926
URL: https://issues.apache.org/jira/browse/HDFS-1926
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Attachments: HDFS-1926.diff

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1787) Not enough xcievers error should propagate to client

[
https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hsieh updated HDFS-1787:
-

Attachment: hdfs-1787.patch

This patch updates the max transfers/xceivers message so that it gets
propagated to the dfs client. I was able to write a reasonable test for the
write side, but the read side requires a change to hadoop common.
FSDataOuptutStream for a the write side has a getWrappedStream method, but the
FSDataInputStream class for the read side does not have or expose this.

Not enough xcievers error should propagate to client
--

Key: HDFS-1787
URL: https://issues.apache.org/jira/browse/HDFS-1787
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Jonathan Hsieh
Labels: newbie
Attachments: hdfs-1787.patch

We find that users often run into the default transceiver limits in the DN.
Putting aside the inherent issues with xceiver threads, it would be nice if
the xceiver limit exceeded error propagated to the client. Currently,
clients simply see an EOFException which is hard to interpret, and have to go
slogging through DN logs to find the underlying issue.
The data transfer protocol should be extended to either have a special error
code for not enough xceivers or should have some error code for generic
errors with which a string can be attached and propagated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1787) Not enough xcievers error should propagate to client


 [ 
https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HDFS-1787:
-

Release Note: This changes the DataTransferProtocol to return a new error 
code when a max tranfers exceeded messages is encountered.
  Status: Patch Available  (was: In Progress)

 Not enough xcievers error should propagate to client
 --

 Key: HDFS-1787
 URL: https://issues.apache.org/jira/browse/HDFS-1787
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Jonathan Hsieh
  Labels: newbie
 Attachments: hdfs-1787.patch


 We find that users often run into the default transceiver limits in the DN. 
 Putting aside the inherent issues with xceiver threads, it would be nice if 
 the xceiver limit exceeded error propagated to the client. Currently, 
 clients simply see an EOFException which is hard to interpret, and have to go 
 slogging through DN logs to find the underlying issue.
 The data transfer protocol should be extended to either have a special error 
 code for not enough xceivers or should have some error code for generic 
 errors with which a string can be attached and propagated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1903) Fix path display for rm/rmr

2011-05-12 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032444#comment-13032444
 ] 

Daryn Sharp commented on HDFS-1903:
---

This patch is ready for integration.  Note it only resolves test issues with 
rm, not all test issues.

 Fix path display for rm/rmr
 ---

 Key: HDFS-1903
 URL: https://issues.apache.org/jira/browse/HDFS-1903
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Fix For: 0.23.0

 Attachments: HDFS-1903.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1927) audit logs could ignore certain xsactions and also could contain ip=null

2011-05-12 Thread John George (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-1927:
--

Attachment: HDFS-1927.patch

 audit logs could ignore certain xsactions and also could contain ip=null
 --

 Key: HDFS-1927
 URL: https://issues.apache.org/jira/browse/HDFS-1927
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.2, 0.23.0
Reporter: John George
Assignee: John George
 Attachments: HDFS-1927.patch


 Namenode audit logs could be ignoring certain transactions that are 
 successfully completed. This is because it check if the RemoteIP is null to 
 decide if a transaction is remote or not. In certain cases, RemoteIP could 
 return null but the xsaction could still be remote. An example is a case 
 where a client gets killed while in the middle of the transaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1927) audit logs could ignore certain xsactions and also could contain ip=null

2011-05-12 Thread John George (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-1927:
--

Affects Version/s: (was: 0.20.2)
   Status: Patch Available  (was: Open)

 audit logs could ignore certain xsactions and also could contain ip=null
 --

 Key: HDFS-1927
 URL: https://issues.apache.org/jira/browse/HDFS-1927
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: John George
Assignee: John George
 Attachments: HDFS-1927.patch


 Namenode audit logs could be ignoring certain transactions that are 
 successfully completed. This is because it check if the RemoteIP is null to 
 decide if a transaction is remote or not. In certain cases, RemoteIP could 
 return null but the xsaction could still be remote. An example is a case 
 where a client gets killed while in the middle of the transaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save


[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032482#comment-13032482
 ] 

Aaron T. Myers commented on HDFS-1505:
--

I believe the test failures are unrelated. All of these are presently failing 
on trunk.

 saveNamespace appears to succeed even if all directories fail to save
 -

 Key: HDFS-1505
 URL: https://issues.apache.org/jira/browse/HDFS-1505
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, 
 hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch


 After HDFS-1071, saveNamespace now appears to succeed even if all of the 
 individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1787) Not enough xcievers error should propagate to client

[
https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032487#comment-13032487
]

Hadoop QA commented on HDFS-1787:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12478976/hdfs-1787.patch
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 2 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 contrib tests. The patch passed contrib unit tests.

+1 system test framework. The patch passed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/499//testReport/
Findbugs warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/499//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/499//console

This message is automatically generated.

Not enough xcievers error should propagate to client
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1725) Set storage directories only at FSImage construction (was Cleanup FSImage construction)

[
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ivan Kelly updated HDFS-1725:
-

Status: Patch Available (was: Open)

Set storage directories only at FSImage construction (was Cleanup FSImage
construction)
---

Key: HDFS-1725
URL: https://issues.apache.org/jira/browse/HDFS-1725
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Fix For: Edit log branch (HDFS-1073)

Attachments: HDFS-1725-review-guide.pdf, HDFS-1725.diff,
HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff,
HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.patch

HDFS-1580 proposes extending FSEditLog to allow it to use editlog streams
which are not backed by StorageDirectory. Currently, to set the the
directories used for edits, NNStorage#setStorageDirectory is called with a
list of URIs as the second argument. NNStorage takes this list or URIs, takes
all file:/// URIs and adds them to its StorageDirectory list. Then, when
opened, FSEditLog will request a list of StorageDirectories from NNStorage
and create a list of EditLogOutputStreams based on these.
This approach cannot work with HDFS-1580. NNStorage exists solely to deal
with filesystem based storage. As such, only StorageDirectories can be
retrieved from NNStorage by FSEditLog. So, FSEditLog should get the URI from
some place other than NNStorage. This presents a further problem, in that,
NNStorage#setStorageDirectories is the current way of setting the URIs for
images and edits. This call can happen at any time, so the directories in
NNStorage can change at any time. If FSEditLog is to get its URIs from
elsewhere, this opens up the risk of the filesystem directories in NNStorage
and filesystem URIs being out of sync.
A solution to this is to stipulate that the URIs for NNStorage are set only
once, on construction. All proper uses of NNStorage#setStorageDirectories are
being called just after construction of the image in any case. All other
cases are using NNStorage#setStorageDirectories not to set the storage
directories, but for the side effects of this call. This guide explains these
other cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1787) Not enough xcievers error should propagate to client


[ 
https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032490#comment-13032490
 ] 

Jonathan Hsieh commented on HDFS-1787:
--

I will look into these newly failing tests.

 Not enough xcievers error should propagate to client
 --

 Key: HDFS-1787
 URL: https://issues.apache.org/jira/browse/HDFS-1787
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Jonathan Hsieh
  Labels: newbie
 Attachments: hdfs-1787.patch


 We find that users often run into the default transceiver limits in the DN. 
 Putting aside the inherent issues with xceiver threads, it would be nice if 
 the xceiver limit exceeded error propagated to the client. Currently, 
 clients simply see an EOFException which is hard to interpret, and have to go 
 slogging through DN logs to find the underlying issue.
 The data transfer protocol should be extended to either have a special error 
 code for not enough xceivers or should have some error code for generic 
 errors with which a string can be attached and propagated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1725) Set storage directories only at FSImage construction (was Cleanup FSImage construction)

[
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ivan Kelly updated HDFS-1725:
-

Status: Open (was: Patch Available)

Set storage directories only at FSImage construction (was Cleanup FSImage
construction)
---

Key: HDFS-1725
URL: https://issues.apache.org/jira/browse/HDFS-1725
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Fix For: Edit log branch (HDFS-1073)

Attachments: HDFS-1725-review-guide.pdf, HDFS-1725.diff,
HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff,
HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1725) Set storage directories only at FSImage construction (was Cleanup FSImage construction)

[
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ivan Kelly updated HDFS-1725:
-

Attachment: HDFS-1725.diff

Brought up to date with current HDFS-1073

Set storage directories only at FSImage construction (was Cleanup FSImage
construction)
---

Key: HDFS-1725
URL: https://issues.apache.org/jira/browse/HDFS-1725
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Fix For: Edit log branch (HDFS-1073)

Attachments: HDFS-1725-review-guide.pdf, HDFS-1725.diff,
HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff,
HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1787) Not enough xcievers error should propagate to client


[ 
https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032495#comment-13032495
 ] 

Aaron T. Myers commented on HDFS-1787:
--

You're encouraged to look at the test failures, but I'm pretty confident 
they're unrelated to this patch. Those tests are known to be failing on trunk.

 Not enough xcievers error should propagate to client
 --

 Key: HDFS-1787
 URL: https://issues.apache.org/jira/browse/HDFS-1787
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Jonathan Hsieh
  Labels: newbie
 Attachments: hdfs-1787.patch


 We find that users often run into the default transceiver limits in the DN. 
 Putting aside the inherent issues with xceiver threads, it would be nice if 
 the xceiver limit exceeded error propagated to the client. Currently, 
 clients simply see an EOFException which is hard to interpret, and have to go 
 slogging through DN logs to find the underlying issue.
 The data transfer protocol should be extended to either have a special error 
 code for not enough xceivers or should have some error code for generic 
 errors with which a string can be attached and propagated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1458) Improve checkpoint performance by avoiding unnecessary image downloads


[ 
https://issues.apache.org/jira/browse/HDFS-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032501#comment-13032501
 ] 

Hairong Kuang commented on HDFS-1458:
-

Todd, you need the fix to HDFS-1627. We already run this and HDFS-1627, 
together with image compression for around 2 months on our large cluster. All 
seem to pretty stable and have improved NN availability/responsiveness a lot.

 Improve checkpoint performance by avoiding unnecessary image downloads
 --

 Key: HDFS-1458
 URL: https://issues.apache.org/jira/browse/HDFS-1458
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: checkpoint-checkfsimageissame.patch, 
 trunkNoDownloadImage.patch, trunkNoDownloadImage1.patch, 
 trunkNoDownloadImage2.patch, trunkNoDownloadImage3.patch


 If secondary namenode could verify that the image it has on its disk is the 
 same as the one in the primary NameNode, it could skip downloading the image 
 from the primary NN, thus completely eliminating the image download overhead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1917) Clean up duplication of dependent jar files

[
https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032515#comment-13032515
]

Eric Yang commented on HDFS-1917:
-

Patch for this jira is going to assume that hadoop-common third party jar files
can be referenced from HADOOP_HOME/lib until HADOOP-6255 and proposed
HADOOP_PREFIX take place. Where HADOOP_HOME is the PREFIX directory of
hadoop-common-0.2x.y = hadoop-hdfs-0.2x.y = hadoop-mapred-0.2x.y.

Clean up duplication of dependent jar files
---

Key: HDFS-1917
URL: https://issues.apache.org/jira/browse/HDFS-1917
Project: Hadoop HDFS
Issue Type: Bug
Components: build
Affects Versions: 0.23.0
Environment: Java 6, RHEL 5.5
Reporter: Eric Yang

For trunk, the build and deployment tree look like this:
hadoop-common-0.2x.y
hadoop-hdfs-0.2x.y
hadoop-mapred-0.2x.y
Technically, hdfs's the third party dependent jar files should be fetch from
hadoop-common. However, it is currently fetching from hadoop-hdfs/lib only.
It would be nice to eliminate the need to repeat duplicated jar files at
build time.
There are two options to manage this dependency list, continue to enhance ant
build structure to fetch and filter jar file dependencies using ivy. On the
other hand, it would be a good opportunity to convert the build structure to
maven, and use maven to manage the provided jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded


[ 
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032517#comment-13032517
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1332:
--

- Why not creating the reason string directly but first creating a HashMap map?
- reason does not seem a good variable name.  How about failingReason?
- Have you tested your patch?

 When unable to place replicas, BlockPlacementPolicy should log reasons nodes 
 were excluded
 --

 Key: HDFS-1332
 URL: https://issues.apache.org/jira/browse/HDFS-1332
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Todd Lipcon
Assignee: Ted Yu
Priority: Minor
  Labels: newbie
 Fix For: 0.23.0

 Attachments: HDFS-1332.patch


 Whenever the block placement policy determines that a node is not a good 
 target it could add the reason for exclusion to a list, and then when we log 
 Not able to place enough replicas we could say why each node was refused. 
 This would help new users who are having issues on pseudo-distributed (eg 
 because their data dir is on /tmp and /tmp is full). Right now it's very 
 difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1869) mkdirs should use the supplied permission for all of the created directories


[ 
https://issues.apache.org/jira/browse/HDFS-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032520#comment-13032520
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1869:
--

Would it work if the given permission does not have x, for example 0600?

 mkdirs should use the supplied permission for all of the created directories
 

 Key: HDFS-1869
 URL: https://issues.apache.org/jira/browse/HDFS-1869
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-1869.patch


 Mkdirs only uses the supplied FsPermission for the last directory of the 
 path.  Paths 0..N-1 will all inherit the parent dir's permissions -even if- 
 inheritPermission is false.  This is a regression from somewhere around 
 0.20.9 and does not follow posix semantics.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1893) Change edit logs and images to be named based on txid


[ 
https://issues.apache.org/jira/browse/HDFS-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032533#comment-13032533
 ] 

Ivan Kelly commented on HDFS-1893:
--

Mostly looks good. There's a couple of TODOs and things which will need to be 
addressed, but I guess there's be a polishing JIRA before any merge back into 
trunk.

I don't like the call to finalizeLogSegment from JournalAndStream. A member 
variable is being maintained segmentStartsAtTxId, which would be better 
incapsulated inside the JournalAndStream or even the EditLogOutputStream 
itself, as it is a property of the segment, not that which is writing to it. I 
understand the rational for keeping this code out of EditLogOutputStream, but I 
don't understand why this needs to be called from JournalAndStream. I think it 
would be better for the stream to notify it's manager whenever it is closed. 
This way the segment is _always_ finalised on a close. 

So, I would propose the following.
{code}
public class EditLogFileOutputStream {
   public interface ClosureListener {
  public void streamClosed();
   }
   private final ClosureListener listener;

   public EditLogFileOutputStream(File name, int size, ClosureListener 
listener);

   public void close() {
  ...
  listener.streamClosed();
   }
}

public FileJournalManager implements JournalManager, 
EditLogFileOutputStream.ClosureListener {
   // etc, etc

   EditLogOutputStream startLogSegment(long txid) {
  return new EditLogFileOutputStream(file, sizeOutputFlushBuffer, this);
   }

   void streamClosed() {
  // what finalize current does.
   }
}
{code}
This removes the need for FSEditLog to know anything about the lifecycle of the 
streams. It currently has to know that finalizeLogSegment has to be called 
after stream close, which is clunky.

 Change edit logs and images to be named based on txid
 -

 Key: HDFS-1893
 URL: https://issues.apache.org/jira/browse/HDFS-1893
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1893-prelim.txt


 This is the main subtask of HDFS-1073: actually switch over the naming of the 
 files to the new format as described in the design doc.
 I imagine it will be split out into a couple separate JIRAs before being 
 committed, but this still be the big kahuna patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded

[
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032536#comment-13032536
]

Ted Yu commented on HDFS-1332:
--

I created the HashMap because there could be multiple datanodes that were not
good target.

When I tried to access
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/496/, I saw it seemed
to be stuck. I couldn't see the exact cause for individual test failure.
I ran all the newly reported failed tests in Eclipse:
org.apache.hadoop.hdfs.server.namenode.TestNodeCount, TestHDFSTrash along with
TestFileConcurrentReader and TestDFSStorageStateRecovery I mentioned yesterday.

I have renamed the reason variable in my next patch.

Thanks for the review.

When unable to place replicas, BlockPlacementPolicy should log reasons nodes
were excluded
--

Attachments: HDFS-1332.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded


 [ 
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-1332:
-

Attachment: (was: HDFS-1332.patch)

 When unable to place replicas, BlockPlacementPolicy should log reasons nodes 
 were excluded
 --

 Key: HDFS-1332
 URL: https://issues.apache.org/jira/browse/HDFS-1332
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Todd Lipcon
Assignee: Ted Yu
Priority: Minor
  Labels: newbie
 Fix For: 0.23.0

 Attachments: HDFS-1332.patch


 Whenever the block placement policy determines that a node is not a good 
 target it could add the reason for exclusion to a list, and then when we log 
 Not able to place enough replicas we could say why each node was refused. 
 This would help new users who are having issues on pseudo-distributed (eg 
 because their data dir is on /tmp and /tmp is full). Right now it's very 
 difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded


 [ 
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-1332:
-

Attachment: HDFS-1332.patch

Updated name of reason variable according to Nicolas' comment.

 When unable to place replicas, BlockPlacementPolicy should log reasons nodes 
 were excluded
 --

 Key: HDFS-1332
 URL: https://issues.apache.org/jira/browse/HDFS-1332
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Todd Lipcon
Assignee: Ted Yu
Priority: Minor
  Labels: newbie
 Fix For: 0.23.0

 Attachments: HDFS-1332.patch


 Whenever the block placement policy determines that a node is not a good 
 target it could add the reason for exclusion to a list, and then when we log 
 Not able to place enough replicas we could say why each node was refused. 
 This would help new users who are having issues on pseudo-distributed (eg 
 because their data dir is on /tmp and /tmp is full). Right now it's very 
 difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded


[ 
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032553#comment-13032553
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1332:
--

It seems that logging the reasons is very expensive.  So we should only log it 
in when debug is enabled.

 When unable to place replicas, BlockPlacementPolicy should log reasons nodes 
 were excluded
 --

 Key: HDFS-1332
 URL: https://issues.apache.org/jira/browse/HDFS-1332
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Todd Lipcon
Assignee: Ted Yu
Priority: Minor
  Labels: newbie
 Fix For: 0.23.0

 Attachments: HDFS-1332.patch


 Whenever the block placement policy determines that a node is not a good 
 target it could add the reason for exclusion to a list, and then when we log 
 Not able to place enough replicas we could say why each node was refused. 
 This would help new users who are having issues on pseudo-distributed (eg 
 because their data dir is on /tmp and /tmp is full). Right now it's very 
 difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded

[
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032555#comment-13032555
]

Ted Yu commented on HDFS-1332:
--

For TestHDFSTrash, I removed my changes in BlockPlacementPolicyDefault,
recompiled and reran the test on commandline using MacBook.
I got:
{code}
Testcase: testTrashEmptier took 0.001 sec
Caused an ERROR
Timeout occurred. Please note the time in the report does not reflect the time
until the timeout.
junit.framework.AssertionFailedError: Timeout occurred. Please note the time in
the report does not reflect the time until the timeout.
{code}

When unable to place replicas, BlockPlacementPolicy should log reasons nodes
were excluded
--

Attachments: HDFS-1332.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format

[
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032556#comment-13032556
]

Matt Foley commented on HDFS-1905:
--

bq. What is a use case in which a cluster ID would be manually specified?

I can imagine some nasty manual recovery process where you might wish to
initialize a clean environment with a specified clusterId, followed by manual
injection of data recovered from image and edits files. Not something I would
want to do :-) but probably should be supported.

However, I agree -format without other args should create a new cid if no old
cid is available. A prompt would be appropriate, same as currently done with
re-use of an available old cid.

Improve the usability of namenode -format
--

The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded


[ 
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032560#comment-13032560
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1332:
--

Hi Ted, although your patch is only adding log messages, it actually may cause 
significant performance degradation in namenode since 
{{BlockPlacementPolicyDefault}} is invoked frequently.  It seems that creating 
the HashMap and the strings for logging are too expensive to be executed.  So, 
all such activities should be executed only in debug mode.

 When unable to place replicas, BlockPlacementPolicy should log reasons nodes 
 were excluded
 --

 Key: HDFS-1332
 URL: https://issues.apache.org/jira/browse/HDFS-1332
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Todd Lipcon
Assignee: Ted Yu
Priority: Minor
  Labels: newbie
 Fix For: 0.23.0

 Attachments: HDFS-1332.patch


 Whenever the block placement policy determines that a node is not a good 
 target it could add the reason for exclusion to a list, and then when we log 
 Not able to place enough replicas we could say why each node was refused. 
 This would help new users who are having issues on pseudo-distributed (eg 
 because their data dir is on /tmp and /tmp is full). Right now it's very 
 difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded


[ 
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032568#comment-13032568
 ] 

Ted Yu commented on HDFS-1332:
--

How about adding a static boolean, blockPlacementDebug, in 
BlockPlacementPolicyDefault which is set to true based on 
System.getenv(BLOCK_PLACEMENT_DEBUG) carrying true ?
Then creating the HashMap and the strings for logging would be done only if 
this boolean is true.

 When unable to place replicas, BlockPlacementPolicy should log reasons nodes 
 were excluded
 --

 Key: HDFS-1332
 URL: https://issues.apache.org/jira/browse/HDFS-1332
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Todd Lipcon
Assignee: Ted Yu
Priority: Minor
  Labels: newbie
 Fix For: 0.23.0

 Attachments: HDFS-1332.patch


 Whenever the block placement policy determines that a node is not a good 
 target it could add the reason for exclusion to a list, and then when we log 
 Not able to place enough replicas we could say why each node was refused. 
 This would help new users who are having issues on pseudo-distributed (eg 
 because their data dir is on /tmp and /tmp is full). Right now it's very 
 difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1920) libhdfs does not build for ARM processors

2011-05-12 Thread Trevor Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Robinson updated HDFS-1920:
--

Status: Patch Available  (was: Open)

 libhdfs does not build for ARM processors
 -

 Key: HDFS-1920
 URL: https://issues.apache.org/jira/browse/HDFS-1920
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/libhdfs
Affects Versions: 0.21.0
 Environment: $ gcc -v
 Using built-in specs.
 COLLECT_GCC=gcc
 COLLECT_LTO_WRAPPER=/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper
 Target: arm-linux-gnueabi
 Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 
 4.5.2-8ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs 
 --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr 
 --program-suffix=-4.5 --enable-shared --enable-multiarch 
 --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id 
 --with-system-zlib --libexecdir=/usr/lib/arm-linux-gnueabi 
 --without-included-gettext --enable-threads=posix 
 --with-gxx-include-dir=/usr/include/c++/4.5 
 --libdir=/usr/lib/arm-linux-gnueabi --enable-nls --with-sysroot=/ 
 --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes 
 --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold 
 --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a 
 --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror 
 --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi 
 --target=arm-linux-gnueabi
 Thread model: posix
 gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4)
 $ uname -a
 Linux panda0 2.6.38-1002-linaro-omap #3-Ubuntu SMP Fri Apr 15 14:00:54 UTC 
 2011 armv7l armv7l armv7l GNU/Linux
Reporter: Trevor Robinson
 Attachments: hadoop-hdfs-arm.patch


 $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1
 ...
 create-libhdfs-configure:
 ...
  [exec] configure: error: Unsupported CPU architecture armv7l
 Once the CPU arch check is fixed in src/c++/libhdfs/m4/apsupport.m4, then 
 next issue is -m32:
 $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1
 ...
 compile-c++-libhdfs:
  [exec] /bin/bash ./libtool --tag=CC   --mode=compile gcc 
 -DPACKAGE_NAME=\libhdfs\ -DPACKAGE_TARNAME=\libhdfs\ 
 -DPACKAGE_VERSION=\0.1.0\ -DPACKAGE_STRING=\libhdfs\ 0.1.0\ 
 -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE_URL=\\ 
 -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1 
 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -Dsize_t=unsigned\ 
 int -Dconst=/\*\*/ -Dvolatile=/\*\*/ -I. 
 -I/home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs -g -O2 -DOS_LINUX 
 -DDSO_DLFCN -DCPU=\arm\ -m32 -I/usr/lib/jvm/java-6-openjdk/include 
 -I/usr/lib/jvm/java-6-openjdk/include/arm -Wall -Wstrict-prototypes -MT 
 hdfs.lo -MD -MP -MF .deps/hdfs.Tpo -c -o hdfs.lo 
 /home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs/hdfs.c
  [exec] make: Warning: File `.deps/hdfs_write.Po' has modification time 
 2.1 s in the future
  [exec] libtool: compile:  gcc -DPACKAGE_NAME=\libhdfs\ 
 -DPACKAGE_TARNAME=\libhdfs\ -DPACKAGE_VERSION=\0.1.0\ 
 -DPACKAGE_STRING=\libhdfs 0.1.0\ 
 -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE_URL=\\ 
 -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1 
 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -Dsize_t=unsigned 
 int -Dconst=/**/ -Dvolatile=/**/ -I. 
 -I/home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs -g -O2 -DOS_LINUX 
 -DDSO_DLFCN -DCPU=\arm\ -m32 -I/usr/lib/jvm/java-6-openjdk/include 
 -I/usr/lib/jvm/java-6-openjdk/include/arm -Wall -Wstrict-prototypes -MT 
 hdfs.lo -MD -MP -MF .deps/hdfs.Tpo -c 
 /home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs/hdfs.c  -fPIC -DPIC -o 
 .libs/hdfs.o
  [exec] cc1: error: unrecognized command line option -m32
  [exec] make: *** [hdfs.lo] Error 1
 Here, gcc does not support -m32 for the ARM target, so -m${JVM_ARCH} must be 
 omitted from CFLAGS and LDFLAGS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1627) Fix NullPointerException in Secondary NameNode


 [ 
https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1627:


Attachment: NPE_SNN1.patch

A patch with a unit test.

 Fix NullPointerException in Secondary NameNode
 --

 Key: HDFS-1627
 URL: https://issues.apache.org/jira/browse/HDFS-1627
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: NPE_SNN.patch, NPE_SNN1.patch


 Secondary NameNode should not reset namespace if no new image is downloaded 
 from the primary NameNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1927) audit logs could ignore certain xsactions and also could contain ip=null

[
https://issues.apache.org/jira/browse/HDFS-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032585#comment-13032585
]

Hadoop QA commented on HDFS-1927:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12478979/HDFS-1927.patch
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The patch appears to cause tar ant target to fail.

-1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these core unit tests:

-1 contrib tests. The patch failed contrib unit tests.

-1 system test framework. The patch failed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/500//testReport/
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/500//console

This message is automatically generated.

audit logs could ignore certain xsactions and also could contain ip=null
--

Key: HDFS-1927
URL: https://issues.apache.org/jira/browse/HDFS-1927
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.23.0
Reporter: John George
Assignee: John George
Attachments: HDFS-1927.patch

Namenode audit logs could be ignoring certain transactions that are
successfully completed. This is because it check if the RemoteIP is null to
decide if a transaction is remote or not. In certain cases, RemoteIP could
return null but the xsaction could still be remote. An example is a case
where a client gets killed while in the middle of the transaction.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1627) Fix NullPointerException in Secondary NameNode


 [ 
https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1627:


Status: Open  (was: Patch Available)

 Fix NullPointerException in Secondary NameNode
 --

 Key: HDFS-1627
 URL: https://issues.apache.org/jira/browse/HDFS-1627
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: NPE_SNN.patch, NPE_SNN1.patch


 Secondary NameNode should not reset namespace if no new image is downloaded 
 from the primary NameNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1627) Fix NullPointerException in Secondary NameNode


 [ 
https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1627:


Status: Patch Available  (was: Open)

 Fix NullPointerException in Secondary NameNode
 --

 Key: HDFS-1627
 URL: https://issues.apache.org/jira/browse/HDFS-1627
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: NPE_SNN.patch, NPE_SNN1.patch


 Secondary NameNode should not reset namespace if no new image is downloaded 
 from the primary NameNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1627) Fix NullPointerException in Secondary NameNode

2011-05-12 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032591#comment-13032591
 ] 

dhruba borthakur commented on HDFS-1627:


+1, looks good to me.

 Fix NullPointerException in Secondary NameNode
 --

 Key: HDFS-1627
 URL: https://issues.apache.org/jira/browse/HDFS-1627
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: NPE_SNN.patch, NPE_SNN1.patch


 Secondary NameNode should not reset namespace if no new image is downloaded 
 from the primary NameNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save


[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032592#comment-13032592
 ] 

Matt Foley commented on HDFS-1505:
--

Regarding the check for 
{code}
+if (storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0 
+storage.getNumStorageDirs(NameNodeDirType.IMAGE_AND_EDITS) == 0) {
+  throw new IOException(Failed to save any storage directories while 
saving namespace);
{code}

Isn't the desired check actually
{code}
+if (storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0 ||
+storage.getNumStorageDirs(NameNodeDirType.EDITS) == 0) {
+  throw new IOException(Failed to save at least one storage directory for 
both IMAGE and EDITS while saving namespace);
{code}

Also, since HDFS-1826 copied the concurrent saveNamespace() logic into 
FSImage.doUpgrade(), would you please add the same code fragment to the end of 
doUpgrade(), and a corresponding corruption unit test case to TestDFSUpgrade?  
Thanks.

 saveNamespace appears to succeed even if all directories fail to save
 -

 Key: HDFS-1505
 URL: https://issues.apache.org/jira/browse/HDFS-1505
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, 
 hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch


 After HDFS-1071, saveNamespace now appears to succeed even if all of the 
 individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1725) Set storage directories only at FSImage construction (was Cleanup FSImage construction)

[
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032595#comment-13032595
]

Hadoop QA commented on HDFS-1725:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12478988/HDFS-1725.diff
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 15 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/502//console

This message is automatically generated.

Set storage directories only at FSImage construction (was Cleanup FSImage
construction)
---

Key: HDFS-1725
URL: https://issues.apache.org/jira/browse/HDFS-1725
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
Fix For: Edit log branch (HDFS-1073)

Attachments: HDFS-1725-review-guide.pdf, HDFS-1725.diff,
HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff,
HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart


 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley reassigned HDFS-1921:


Assignee: Matt Foley

 Save namespace can cause NN to be unable to come up on restart
 --

 Key: HDFS-1921
 URL: https://issues.apache.org/jira/browse/HDFS-1921
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Aaron T. Myers
Assignee: Matt Foley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0


 I discovered this in the course of trying to implement a fix for HDFS-1505.
 Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
 namespace proceeds in the following order:
 # rename current to lastcheckpoint.tmp for all of them,
 # save image and recreate edits for all of them,
 # rename lastcheckpoint.tmp to previous.checkpoint.
 The problem is that step 3 occurs regardless of whether or not an error 
 occurs for all storage directories in step 2. Upon restart, the NN will see 
 non-existent or corrupt {{current}} directories, and no 
 {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
 directories are not formatted.
 This issue appears to be present on both 0.22 and 0.23. This should arguably 
 be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart


[ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032597#comment-13032597
 ] 

Matt Foley commented on HDFS-1921:
--

I will propose a patch for this, unless Dmytro wants it.

 Save namespace can cause NN to be unable to come up on restart
 --

 Key: HDFS-1921
 URL: https://issues.apache.org/jira/browse/HDFS-1921
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Aaron T. Myers
Assignee: Matt Foley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0


 I discovered this in the course of trying to implement a fix for HDFS-1505.
 Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
 namespace proceeds in the following order:
 # rename current to lastcheckpoint.tmp for all of them,
 # save image and recreate edits for all of them,
 # rename lastcheckpoint.tmp to previous.checkpoint.
 The problem is that step 3 occurs regardless of whether or not an error 
 occurs for all storage directories in step 2. Upon restart, the NN will see 
 non-existent or corrupt {{current}} directories, and no 
 {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
 directories are not formatted.
 This issue appears to be present on both 0.22 and 0.23. This should arguably 
 be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save


[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032607#comment-13032607
 ] 

Matt Foley commented on HDFS-1505:
--

One more related issue: at the end of saveNamespace() it calls 
editLog.open(), which is implemented by FSEditLog.open().  This routine has 
the same problem:  if the list of EditLogOutputStream is empty, it appears to 
succeed, but should throw an exception.

I would suggest fixing the lack of notification in FSEditLog.open(), but also 
in your patch to saveNamespace() the check for empty IMAGE and EDITS lists 
should precede the call to editLog.open().  

 saveNamespace appears to succeed even if all directories fail to save
 -

 Key: HDFS-1505
 URL: https://issues.apache.org/jira/browse/HDFS-1505
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, 
 hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch


 After HDFS-1071, saveNamespace now appears to succeed even if all of the 
 individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart


[ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032608#comment-13032608
 ] 

Aaron T. Myers commented on HDFS-1921:
--

Hey Matt, that's great news. Thanks for picking this up. 

I just talked to Todd, and he agrees that this code will be superseded in 0.23 
by the work that's going on HDFS-1073. So, I think it's reasonable to only work 
on a patch for 0.22 as part of this JIRA.

 Save namespace can cause NN to be unable to come up on restart
 --

 Key: HDFS-1921
 URL: https://issues.apache.org/jira/browse/HDFS-1921
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Aaron T. Myers
Assignee: Matt Foley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0


 I discovered this in the course of trying to implement a fix for HDFS-1505.
 Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
 namespace proceeds in the following order:
 # rename current to lastcheckpoint.tmp for all of them,
 # save image and recreate edits for all of them,
 # rename lastcheckpoint.tmp to previous.checkpoint.
 The problem is that step 3 occurs regardless of whether or not an error 
 occurs for all storage directories in step 2. Upon restart, the NN will see 
 non-existent or corrupt {{current}} directories, and no 
 {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
 directories are not formatted.
 This issue appears to be present on both 0.22 and 0.23. This should arguably 
 be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1917) Clean up duplication of dependent jar files


 [ 
https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDFS-1917:


Attachment: HDFS-1917.patch

* Changed ivy configuration to setup third party jar file for compile profile.
* common profile contains only common-daemon to be included in HADOOP_HOME/lib.


 Clean up duplication of dependent jar files
 ---

 Key: HDFS-1917
 URL: https://issues.apache.org/jira/browse/HDFS-1917
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.0
 Environment: Java 6, RHEL 5.5
Reporter: Eric Yang
 Attachments: HDFS-1917.patch


 For trunk, the build and deployment tree look like this:
 hadoop-common-0.2x.y
 hadoop-hdfs-0.2x.y
 hadoop-mapred-0.2x.y
 Technically, hdfs's the third party dependent jar files should be fetch from 
 hadoop-common.  However, it is currently fetching from hadoop-hdfs/lib only.  
 It would be nice to eliminate the need to repeat duplicated jar files at 
 build time.
 There are two options to manage this dependency list, continue to enhance ant 
 build structure to fetch and filter jar file dependencies using ivy.  On the 
 other hand, it would be a good opportunity to convert the build structure to 
 maven, and use maven to manage the provided jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1917) Clean up duplication of dependent jar files


 [ 
https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDFS-1917:


Assignee: Eric Yang
Release Note: Remove packaging of duplicated third party jar files
  Status: Patch Available  (was: Open)

Remove packaging of duplicated third party jar files

 Clean up duplication of dependent jar files
 ---

 Key: HDFS-1917
 URL: https://issues.apache.org/jira/browse/HDFS-1917
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.0
 Environment: Java 6, RHEL 5.5
Reporter: Eric Yang
Assignee: Eric Yang
 Attachments: HDFS-1917.patch


 For trunk, the build and deployment tree look like this:
 hadoop-common-0.2x.y
 hadoop-hdfs-0.2x.y
 hadoop-mapred-0.2x.y
 Technically, hdfs's the third party dependent jar files should be fetch from 
 hadoop-common.  However, it is currently fetching from hadoop-hdfs/lib only.  
 It would be nice to eliminate the need to repeat duplicated jar files at 
 build time.
 There are two options to manage this dependency list, continue to enhance ant 
 build structure to fetch and filter jar file dependencies using ivy.  On the 
 other hand, it would be a good opportunity to convert the build structure to 
 maven, and use maven to manage the provided jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded

[
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032636#comment-13032636
]

Todd Lipcon commented on HDFS-1332:
---

Hey Nicholas. I thought about the performance impact as well, but I came to the
conlusion that the node-selection code is not a hot code path. In my
experience, the NN spends much much more time on read operations than on block
allocation. For example, on one production NN whose metrics I have access to,
it has performed 3.6M addBlock operations vs 105M FileInfoOps, 30M GetListing
ops, 27M GetBlockLocations ops.

Additionally, the new code will only get run for nodes which are
decommissioning, out of space, or highly loaded. Thus it's not likely that it
will add any appreciable overhead to most chooseTarget operations.

Looking at the existing code, it's hardly optimized at all. For example, each
invocation of chooseRandom() invokes countNumOfAvailableNodes which takes and
releases locks, computes String substrings, etc.

When unable to place replicas, BlockPlacementPolicy should log reasons nodes
were excluded
--

Attachments: HDFS-1332.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1627) Fix NullPointerException in Secondary NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032638#comment-13032638
 ] 

Todd Lipcon commented on HDFS-1627:
---

Me too, thanks Hairong!

 Fix NullPointerException in Secondary NameNode
 --

 Key: HDFS-1627
 URL: https://issues.apache.org/jira/browse/HDFS-1627
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: NPE_SNN.patch, NPE_SNN1.patch


 Secondary NameNode should not reset namespace if no new image is downloaded 
 from the primary NameNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save


[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032641#comment-13032641
 ] 

Aaron T. Myers commented on HDFS-1505:
--

Thanks a lot for the review/comments, Matt.

Upon further reflection, I think the desired check should actually be:

{code}
if ((storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0 ||
storage.getNumStorageDirs(NameNodeDirType.EDITS) == 0) 
storage.getNumStorageDirs(NameNodeDirType.IMAGE_AND_EDITS) == 0) {
  throw new IOException(Failed to save any storage directories while 
saving namespace);
{code}

What do you think? Note that IMAGE_AND_EDITS is a distinct type of storage 
directory, which contains both {{fsimage}} and {{edits}} files. Apologies if 
you already knew that.

bq. Also, since HDFS-1826 copied the concurrent saveNamespace() logic into 
FSImage.doUpgrade(), would you please add the same code fragment to the end of 
doUpgrade(), and a corresponding corruption unit test case to TestDFSUpgrade? 
Thanks.

Will do.

bq. I would suggest fixing the lack of notification in FSEditLog.open(), but 
also in your patch to saveNamespace() the check for empty IMAGE and EDITS lists 
should precede the call to editLog.open().

Agreed. Will do.

 saveNamespace appears to succeed even if all directories fail to save
 -

 Key: HDFS-1505
 URL: https://issues.apache.org/jira/browse/HDFS-1505
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, 
 hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch


 After HDFS-1071, saveNamespace now appears to succeed even if all of the 
 individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1917) Clean up duplication of dependent jar files

2011-05-12 Thread Luke Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032644#comment-13032644
 ] 

Luke Lu commented on HDFS-1917:
---

Though I understand the goal is to separate the hdfs only dependencies for 
easier dedup, it seems to me that if you keep the common profile as is and add 
an hdfs profile for common-daemon. The patch would be smaller and less 
confusing (the common profile now contains hdfs only dependencies and the 
compile profile is actually from common.)

 Clean up duplication of dependent jar files
 ---

 Key: HDFS-1917
 URL: https://issues.apache.org/jira/browse/HDFS-1917
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.0
 Environment: Java 6, RHEL 5.5
Reporter: Eric Yang
Assignee: Eric Yang
 Attachments: HDFS-1917.patch


 For trunk, the build and deployment tree look like this:
 hadoop-common-0.2x.y
 hadoop-hdfs-0.2x.y
 hadoop-mapred-0.2x.y
 Technically, hdfs's the third party dependent jar files should be fetch from 
 hadoop-common.  However, it is currently fetching from hadoop-hdfs/lib only.  
 It would be nice to eliminate the need to repeat duplicated jar files at 
 build time.
 There are two options to manage this dependency list, continue to enhance ant 
 build structure to fetch and filter jar file dependencies using ivy.  On the 
 other hand, it would be a good opportunity to convert the build structure to 
 maven, and use maven to manage the provided jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded

[
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032642#comment-13032642
]

Hadoop QA commented on HDFS-1332:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12478995/HDFS-1332.patch
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these core unit tests:
org.apache.hadoop.cli.TestHDFSCLI

org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
org.apache.hadoop.hdfs.TestDFSShell
org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
org.apache.hadoop.hdfs.TestFileConcurrentReader
org.apache.hadoop.tools.TestJMXGet

+1 contrib tests. The patch passed contrib unit tests.

+1 system test framework. The patch passed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/504//testReport/
Findbugs warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/504//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/504//console

This message is automatically generated.

When unable to place replicas, BlockPlacementPolicy should log reasons nodes
were excluded
--

Attachments: HDFS-1332.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1627) Fix NullPointerException in Secondary NameNode

[
https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032650#comment-13032650
]

Hadoop QA commented on HDFS-1627:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12479004/NPE_SNN1.patch
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 contrib tests. The patch passed contrib unit tests.

+1 system test framework. The patch passed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/503//testReport/
Findbugs warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/503//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/503//console

This message is automatically generated.

Fix NullPointerException in Secondary NameNode
--

Key: HDFS-1627
URL: https://issues.apache.org/jira/browse/HDFS-1627
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Fix For: 0.23.0

Attachments: NPE_SNN.patch, NPE_SNN1.patch

Secondary NameNode should not reset namespace if no new image is downloaded
from the primary NameNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded


[ 
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032651#comment-13032651
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1332:
--

How about adding {{BlockPlacementPolicyDefault.LOG}} and use it to print the 
messages when debug is enabled?

 When unable to place replicas, BlockPlacementPolicy should log reasons nodes 
 were excluded
 --

 Key: HDFS-1332
 URL: https://issues.apache.org/jira/browse/HDFS-1332
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Todd Lipcon
Assignee: Ted Yu
Priority: Minor
  Labels: newbie
 Fix For: 0.23.0

 Attachments: HDFS-1332.patch


 Whenever the block placement policy determines that a node is not a good 
 target it could add the reason for exclusion to a list, and then when we log 
 Not able to place enough replicas we could say why each node was refused. 
 This would help new users who are having issues on pseudo-distributed (eg 
 because their data dir is on /tmp and /tmp is full). Right now it's very 
 difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1869) mkdirs should use the supplied permission for all of the created directories


[ 
https://issues.apache.org/jira/browse/HDFS-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032653#comment-13032653
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1869:
--

I think it is good to add a unit test for the 0600 case.  It would also 
illustrate what is the expected behavior.

BTW, have you verified it on BSD unix?  We should make it the same as BSD, 
which our implementation is based on.

 mkdirs should use the supplied permission for all of the created directories
 

 Key: HDFS-1869
 URL: https://issues.apache.org/jira/browse/HDFS-1869
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-1869.patch


 Mkdirs only uses the supplied FsPermission for the last directory of the 
 path.  Paths 0..N-1 will all inherit the parent dir's permissions -even if- 
 inheritPermission is false.  This is a regression from somewhere around 
 0.20.9 and does not follow posix semantics.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format

[
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032655#comment-13032655
]

Suresh Srinivas commented on HDFS-1905:
---

The reason format requires a cluster ID is the following:
# When you add new namenodes to the existing cluster, they become the part of
the federated cluster only if the same cluster ID is used. Otherwise, it is a
different cluster.
# This leaves us two choices - allow automatic generation of cluster ID for the
first namenode. Then expect admin to use the same cluster ID for formatting
additional namenodes. But this leaves us with admin accidentally formatting
additional namenode without specifying a cluster ID and a cluster ID is
automatically is generated. The namenode that was intended to be part of the
same cluster now is not!

Given this, we decided not to automatically generate cluster ID. An admin must
specify it.

A prompt would be appropriate, same as currently done with re-use of an
available old cid.
I do not think this solves the problem I stated.

Improve the usability of namenode -format
--

The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1920) libhdfs does not build for ARM processors


[ 
https://issues.apache.org/jira/browse/HDFS-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032654#comment-13032654
 ] 

Hadoop QA commented on HDFS-1920:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12478901/hadoop-hdfs-arm.patch
  against trunk revision 1102153.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestDFSShell
  org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
  org.apache.hadoop.hdfs.TestFileConcurrentReader
  org.apache.hadoop.tools.TestJMXGet

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/501//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/501//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/501//console

This message is automatically generated.

 libhdfs does not build for ARM processors
 -

 Key: HDFS-1920
 URL: https://issues.apache.org/jira/browse/HDFS-1920
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/libhdfs
Affects Versions: 0.21.0
 Environment: $ gcc -v
 Using built-in specs.
 COLLECT_GCC=gcc
 COLLECT_LTO_WRAPPER=/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper
 Target: arm-linux-gnueabi
 Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 
 4.5.2-8ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs 
 --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr 
 --program-suffix=-4.5 --enable-shared --enable-multiarch 
 --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id 
 --with-system-zlib --libexecdir=/usr/lib/arm-linux-gnueabi 
 --without-included-gettext --enable-threads=posix 
 --with-gxx-include-dir=/usr/include/c++/4.5 
 --libdir=/usr/lib/arm-linux-gnueabi --enable-nls --with-sysroot=/ 
 --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes 
 --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold 
 --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a 
 --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror 
 --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi 
 --target=arm-linux-gnueabi
 Thread model: posix
 gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4)
 $ uname -a
 Linux panda0 2.6.38-1002-linaro-omap #3-Ubuntu SMP Fri Apr 15 14:00:54 UTC 
 2011 armv7l armv7l armv7l GNU/Linux
Reporter: Trevor Robinson
 Attachments: hadoop-hdfs-arm.patch


 $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1
 ...
 create-libhdfs-configure:
 ...
  [exec] configure: error: Unsupported CPU architecture armv7l
 Once the CPU arch check is fixed in src/c++/libhdfs/m4/apsupport.m4, then 
 next issue is -m32:
 $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1
 ...
 compile-c++-libhdfs:
  [exec] /bin/bash ./libtool --tag=CC   --mode=compile gcc 
 -DPACKAGE_NAME=\libhdfs\ -DPACKAGE_TARNAME=\libhdfs\ 
 -DPACKAGE_VERSION=\0.1.0\ -DPACKAGE_STRING=\libhdfs\ 0.1.0\ 
 -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE_URL=\\ 
 -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1 
 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -Dsize_t=unsigned\ 
 int -Dconst=/\*\*/ -Dvolatile=/\*\*/ -I. 
 -I/home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs -g -O2 -DOS_LINUX 
 -DDSO_DLFCN -DCPU=\arm\ -m32 -I/usr/lib/jvm/java-6-openjdk/include 
 -I/usr/lib/jvm/java-6-openjdk/include/arm -Wall -Wstrict-prototypes -MT 
 hdfs.lo -MD -MP -MF .deps/hdfs.Tpo -c -o hdfs.lo

[jira] [Assigned] (HDFS-1762) Allow TestHDFSCLI to be run against a cluster

2011-05-12 Thread Konstantin Boudnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik reassigned HDFS-1762:


Assignee: Konstantin Boudnik

 Allow TestHDFSCLI to be run against a cluster
 -

 Key: HDFS-1762
 URL: https://issues.apache.org/jira/browse/HDFS-1762
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Tom White
Assignee: Konstantin Boudnik

 Currently TestHDFSCLI starts mini clusters to run tests against. It would be 
 useful to be able to support running against arbitrary clusters for testing 
 purposes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format

[
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032657#comment-13032657
]

Todd Lipcon commented on HDFS-1905:
---

Since the vast majority of users will not be using the federation feature, I
think it's best to optimize for the common case and not for federated clusters.
That is to say, we don't want to pollute the mental model of HDFS for new users
by making them understand cluster IDs, block pools, etc.

bq. But this leaves us with admin accidentally formatting additional namenode
without specifying a cluster ID and a cluster ID is automatically is generated.
The namenode that was intended to be part of the same cluster now is not!

Sure, but they will figure this out before they put any data into it (since the
DNs won't talk to this NN). And then calling format again with the correct
cluster ID specified is no problem at all for them.

Improve the usability of namenode -format
--

The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded


[ 
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032661#comment-13032661
 ] 

Todd Lipcon commented on HDFS-1332:
---

I don't think restricting nice error messages to the case when the NN is in 
debug mode is a good idea. We should endeavor to always have error messages 
that provide enough information to the user to understand and rectify the 
problem. New users are unlikely to know the tricks to switch over to debug mode 
using the cryptic daemonlog interface, and new users are the ones who need 
nice errors the most.

 When unable to place replicas, BlockPlacementPolicy should log reasons nodes 
 were excluded
 --

 Key: HDFS-1332
 URL: https://issues.apache.org/jira/browse/HDFS-1332
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Todd Lipcon
Assignee: Ted Yu
Priority: Minor
  Labels: newbie
 Fix For: 0.23.0

 Attachments: HDFS-1332.patch


 Whenever the block placement policy determines that a node is not a good 
 target it could add the reason for exclusion to a list, and then when we log 
 Not able to place enough replicas we could say why each node was refused. 
 This would help new users who are having issues on pseudo-distributed (eg 
 because their data dir is on /tmp and /tmp is full). Right now it's very 
 difficult to figure out the issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1869) mkdirs should use the supplied permission for all of the created directories

2011-05-12 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032666#comment-13032666
]

Daryn Sharp commented on HDFS-1869:
---

Ok, test will be added. I haven't tested on *BSD due to easy access, but I
have tested on Darwin. However, the bsd man page for mkdir states:

bq. -p Create intermediate directories as required. If this option is
not specified, the full path prefix of each operand must already exist. On the
other hand, with this option specified, no error will be reported if a
directory given as an operand already exists. *Intermediate directories are
created with permission bits of rwxrwxrwx (0777) as modified by the current
umask*, plus write and search permission for the owner.

Will double check that write and search are indeed implicitly added.

mkdirs should use the supplied permission for all of the created directories

Key: HDFS-1869
URL: https://issues.apache.org/jira/browse/HDFS-1869
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.23.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Attachments: HDFS-1869.patch

Mkdirs only uses the supplied FsPermission for the last directory of the
path. Paths 0..N-1 will all inherit the parent dir's permissions -even if-
inheritPermission is false. This is a regression from somewhere around
0.20.9 and does not follow posix semantics.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format

[
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032668#comment-13032668
]

Suresh Srinivas commented on HDFS-1905:
---

Couple of other comments I missed:
During design we wanted to ensure cluster ID is unique - to avoid accidentally
naming two clusters with the same cluster ID. To do that, we have an option to
generate a unique, UUID like, cluster ID.

Instead of a complicated UUID kind of to identify a cluster, it would be good
to use a name to identify a cluster. Given the small number of cluster, coming
with a simple naming scheme should not be hard.

Given that we should delete the functionality to generate cluster ID.

Improve the usability of namenode -format
--

The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1920) libhdfs does not build for ARM processors

2011-05-12 Thread Trevor Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032670#comment-13032670
 ] 

Trevor Robinson commented on HDFS-1920:
---

No tests included because this change just fixes a build failure. Manually 
verified that x86-64 builds unchanged (-m64 is properly specified) and that ARM 
now builds (-m32 is not specified).

Core unit test failures are existing and unrelated issues. This change only 
affects libhdfs.

Would a committer please review the change?

 libhdfs does not build for ARM processors
 -

 Key: HDFS-1920
 URL: https://issues.apache.org/jira/browse/HDFS-1920
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/libhdfs
Affects Versions: 0.21.0
 Environment: $ gcc -v
 Using built-in specs.
 COLLECT_GCC=gcc
 COLLECT_LTO_WRAPPER=/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper
 Target: arm-linux-gnueabi
 Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 
 4.5.2-8ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs 
 --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr 
 --program-suffix=-4.5 --enable-shared --enable-multiarch 
 --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id 
 --with-system-zlib --libexecdir=/usr/lib/arm-linux-gnueabi 
 --without-included-gettext --enable-threads=posix 
 --with-gxx-include-dir=/usr/include/c++/4.5 
 --libdir=/usr/lib/arm-linux-gnueabi --enable-nls --with-sysroot=/ 
 --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes 
 --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold 
 --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a 
 --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror 
 --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi 
 --target=arm-linux-gnueabi
 Thread model: posix
 gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4)
 $ uname -a
 Linux panda0 2.6.38-1002-linaro-omap #3-Ubuntu SMP Fri Apr 15 14:00:54 UTC 
 2011 armv7l armv7l armv7l GNU/Linux
Reporter: Trevor Robinson
 Attachments: hadoop-hdfs-arm.patch


 $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1
 ...
 create-libhdfs-configure:
 ...
  [exec] configure: error: Unsupported CPU architecture armv7l
 Once the CPU arch check is fixed in src/c++/libhdfs/m4/apsupport.m4, then 
 next issue is -m32:
 $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1
 ...
 compile-c++-libhdfs:
  [exec] /bin/bash ./libtool --tag=CC   --mode=compile gcc 
 -DPACKAGE_NAME=\libhdfs\ -DPACKAGE_TARNAME=\libhdfs\ 
 -DPACKAGE_VERSION=\0.1.0\ -DPACKAGE_STRING=\libhdfs\ 0.1.0\ 
 -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE_URL=\\ 
 -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1 
 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -Dsize_t=unsigned\ 
 int -Dconst=/\*\*/ -Dvolatile=/\*\*/ -I. 
 -I/home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs -g -O2 -DOS_LINUX 
 -DDSO_DLFCN -DCPU=\arm\ -m32 -I/usr/lib/jvm/java-6-openjdk/include 
 -I/usr/lib/jvm/java-6-openjdk/include/arm -Wall -Wstrict-prototypes -MT 
 hdfs.lo -MD -MP -MF .deps/hdfs.Tpo -c -o hdfs.lo 
 /home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs/hdfs.c
  [exec] make: Warning: File `.deps/hdfs_write.Po' has modification time 
 2.1 s in the future
  [exec] libtool: compile:  gcc -DPACKAGE_NAME=\libhdfs\ 
 -DPACKAGE_TARNAME=\libhdfs\ -DPACKAGE_VERSION=\0.1.0\ 
 -DPACKAGE_STRING=\libhdfs 0.1.0\ 
 -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE_URL=\\ 
 -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1 
 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -Dsize_t=unsigned 
 int -Dconst=/**/ -Dvolatile=/**/ -I. 
 -I/home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs -g -O2 -DOS_LINUX 
 -DDSO_DLFCN -DCPU=\arm\ -m32 -I/usr/lib/jvm/java-6-openjdk/include 
 -I/usr/lib/jvm/java-6-openjdk/include/arm -Wall -Wstrict-prototypes -MT 
 hdfs.lo -MD -MP -MF .deps/hdfs.Tpo -c 
 /home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs/hdfs.c  -fPIC -DPIC -o 
 .libs/hdfs.o
  [exec] cc1: error: unrecognized command line option -m32
  [exec] make: *** [hdfs.lo] Error 1
 Here, gcc does not support -m32 for the ARM target, so -m${JVM_ARCH} must be 
 omitted from CFLAGS and LDFLAGS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded

[
https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032671#comment-13032671
]

Tsz Wo (Nicholas), SZE commented on HDFS-1332:
--

Hi Todd, it is questionable whether this is a nice error message. Too many
error messages confuse users.

Replication also uses {{BlockPlacementPolicy}}. Have you counted it? Also,
your example is just one example. It may not be a good representative.

Moreover, the performance degradation is twofold:
# it takes time to create the messages/objects, and
# it creates additional objects for GC.

When unable to place replicas, BlockPlacementPolicy should log reasons nodes
were excluded
--

Attachments: HDFS-1332.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1903) Fix path display for rm/rmr


 [ 
https://issues.apache.org/jira/browse/HDFS-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1903:
-

Hadoop Flags: [Reviewed]

+1 patch looks good.

 Fix path display for rm/rmr
 ---

 Key: HDFS-1903
 URL: https://issues.apache.org/jira/browse/HDFS-1903
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Fix For: 0.23.0

 Attachments: HDFS-1903.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1869) mkdirs should use the supplied permission for all of the created directories

2011-05-12 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032679#comment-13032679
]

Daryn Sharp commented on HDFS-1869:
---

Creating a multi-level dir with 0600 creates all dirs with 0600 -- will post
test shortly. So it works as expected in the aspect of you get the
permissions you asked for.

It's neglecting to implicitly add u+rx. In unix this is required since mkdir
-p does a series of mkdir/chdir, so u+rx is required to do the chdir calls. In
hdfs it's not necessary since it verifies permissions in the directory where
the mkdir originates, and then creates all the dirs with no permission
checking. Do want the u+rx behavior added too? If so, would it be ok to be
done on a separate jira?

mkdirs should use the supplied permission for all of the created directories

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart


[ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032680#comment-13032680
 ] 

Aaron T. Myers commented on HDFS-1921:
--

Also, I should mention that there's a test case posted on HDFS-1505 which will 
illustrate this case.

 Save namespace can cause NN to be unable to come up on restart
 --

 Key: HDFS-1921
 URL: https://issues.apache.org/jira/browse/HDFS-1921
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Aaron T. Myers
Assignee: Matt Foley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0


 I discovered this in the course of trying to implement a fix for HDFS-1505.
 Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
 namespace proceeds in the following order:
 # rename current to lastcheckpoint.tmp for all of them,
 # save image and recreate edits for all of them,
 # rename lastcheckpoint.tmp to previous.checkpoint.
 The problem is that step 3 occurs regardless of whether or not an error 
 occurs for all storage directories in step 2. Upon restart, the NN will see 
 non-existent or corrupt {{current}} directories, and no 
 {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
 directories are not formatted.
 This issue appears to be present on both 0.22 and 0.23. This should arguably 
 be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-12 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032682#comment-13032682
 ] 

Jitendra Nath Pandey commented on HDFS-1592:


1. There seems to be a redundancy in following conditions 
   (volsFailed  volFailuresTolerated)
and  validVolsRequired  storage.getNumStorageDirs().

Since both checks throw the same exception, I will recommend doing it in one 
condition.

2. Please don't remove the DataNode.LOG.error.


 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1903) Fix path display for rm/rmr


 [ 
https://issues.apache.org/jira/browse/HDFS-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1903:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Daryn!

 Fix path display for rm/rmr
 ---

 Key: HDFS-1903
 URL: https://issues.apache.org/jira/browse/HDFS-1903
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Fix For: 0.23.0

 Attachments: HDFS-1903.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1627) Fix NullPointerException in Secondary NameNode


 [ 
https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1627:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this! The failed tests are not relate to this patch.

 Fix NullPointerException in Secondary NameNode
 --

 Key: HDFS-1627
 URL: https://issues.apache.org/jira/browse/HDFS-1627
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: NPE_SNN.patch, NPE_SNN1.patch


 Secondary NameNode should not reset namespace if no new image is downloaded 
 from the primary NameNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1917) Clean up duplication of dependent jar files


 [ 
https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDFS-1917:


Attachment: HDFS-1917-1.patch

Revise patch to add hdfs ivy configuration.  Thanks Luke!

 Clean up duplication of dependent jar files
 ---

 Key: HDFS-1917
 URL: https://issues.apache.org/jira/browse/HDFS-1917
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.0
 Environment: Java 6, RHEL 5.5
Reporter: Eric Yang
Assignee: Eric Yang
 Attachments: HDFS-1917-1.patch, HDFS-1917.patch


 For trunk, the build and deployment tree look like this:
 hadoop-common-0.2x.y
 hadoop-hdfs-0.2x.y
 hadoop-mapred-0.2x.y
 Technically, hdfs's the third party dependent jar files should be fetch from 
 hadoop-common.  However, it is currently fetching from hadoop-hdfs/lib only.  
 It would be nice to eliminate the need to repeat duplicated jar files at 
 build time.
 There are two options to manage this dependency list, continue to enhance ant 
 build structure to fetch and filter jar file dependencies using ivy.  On the 
 other hand, it would be a good opportunity to convert the build structure to 
 maven, and use maven to manage the provided jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-1904) Secondary Namenode dies when a mkdir on a non-existent parent directory is run


 [ 
https://issues.apache.org/jira/browse/HDFS-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang resolved HDFS-1904.
-

Resolution: Duplicate

 Secondary Namenode dies when a mkdir on a non-existent parent directory is run
 --

 Key: HDFS-1904
 URL: https://issues.apache.org/jira/browse/HDFS-1904
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
 Environment: Linux
Reporter: Ravi Prakash
Priority: Blocker

 Steps to reproduce:
 1. Configure secondary namenode with {{fs.checkpoint.period}} set to a small 
 value (eg 3 seconds)
 2. Format filesystem and start HDFS
 3. hadoop fs -mkdir /foo/bar ; sleep 5 ; echo | hadoop fs -put - /foo/bar/baz
 2NN will crash with the following trace on the next checkpoint. The primary 
 NN also crashes on next restart
 11/05/10 15:19:28 ERROR namenode.SecondaryNameNode: Throwable Exception in 
 doCheckpoint: 
 11/05/10 15:19:28 ERROR namenode.SecondaryNameNode: 
 java.lang.NullPointerException: Panic: parent does not exist
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1693)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1707)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1544)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:288)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:116)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:62)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:723)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:720)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$500(SecondaryNameNode.java:610)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:487)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:448)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:312)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:276)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1377) Quota bug for partial blocks allows quotas to be violated


 [ 
https://issues.apache.org/jira/browse/HDFS-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1377:
-

Fix Version/s: 0.20.204.0

I have merged this to branch-0.20-security-204.

 Quota bug for partial blocks allows quotas to be violated 
 --

 Key: HDFS-1377
 URL: https://issues.apache.org/jira/browse/HDFS-1377
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0, 0.23.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.20.3, 0.20.204.0, 0.20.205.0, 0.21.1, Federation 
 Branch, 0.22.0, 0.23.0

 Attachments: HDFS-1377.patch, hdfs-1377-1.patch, 
 hdfs-1377-b20-1.patch, hdfs-1377-b20-2.patch, hdfs-1377-b20-3.patch


 There's a bug in the quota code that causes them not to be respected when a 
 file is not an exact multiple of the block size. Here's an example:
 {code}
 $ hadoop fs -mkdir /test
 $ hadoop dfsadmin -setSpaceQuota 384M /test
 $ ls dir/ | wc -l   # dir contains 101 files
 101
 $ du -ms dir# each is 3mb
 304   dir
 $ hadoop fs -put dir /test
 $ hadoop fs -count -q /test
 none inf   402653184  -5505024002 
  101  317718528 hdfs://haus01.sf.cloudera.com:10020/test
 $ hadoop fs -stat %o %r /test/dir/f30
 134217728 3# three 128mb blocks
 {code}
 INodeDirectoryWithQuota caches the number of bytes consumed by it's children 
 in {{diskspace}}. The quota adjustment code has a bug that causes 
 {{diskspace}} to get updated incorrectly when a file is not an exact multiple 
 of the block size (the value ends up being negative). 
 This causes the quota checking code to think that the files in the directory 
 consumes less space than they actually do, so the verifyQuota does not throw 
 a QuotaExceededException even when the directory is over quota. However the 
 bug isn't visible to users because {{fs count -q}} reports the numbers 
 generated by INode#getContentSummary which adds up the sizes of the blocks 
 rather than use the cached INodeDirectoryWithQuota#diskspace value.
 In FSDirectory#addBlock the disk space consumed is set conservatively to the 
 full block size * the number of replicas:
 {code}
 updateCount(inodes, inodes.length-1, 0,
 fileNode.getPreferredBlockSize()*fileNode.getReplication(), true);
 {code}
 In FSNameSystem#addStoredBlock we adjust for this conservative estimate by 
 subtracting out the difference between the conservative estimate and what the 
 number of bytes actually stored was:
 {code}
 //Updated space consumed if required.
 INodeFile file = (storedBlock != null) ? storedBlock.getINode() : null;
 long diff = (file == null) ? 0 :
 (file.getPreferredBlockSize() - storedBlock.getNumBytes());
 if (diff  0  file.isUnderConstruction() 
 cursize  storedBlock.getNumBytes()) {
 ...
 dir.updateSpaceConsumed(path, 0, -diff*file.getReplication());
 {code}
 We do the same in FSDirectory#replaceNode when completing the file, but at a 
 file granularity (I believe the intent here is to correct for the cases when 
 there's a failure replicating blocks and recovery). Since oldnode is under 
 construction INodeFile#diskspaceConsumed will use the preferred block size  
 (vs of Block#getNumBytes used by newnode) so we will again subtract out the 
 difference between the full block size and what the number of bytes actually 
 stored was:
 {code}
 long dsOld = oldnode.diskspaceConsumed();
 ...
 //check if disk space needs to be updated.
 long dsNew = 0;
 if (updateDiskspace  (dsNew = newnode.diskspaceConsumed()) != dsOld) {
   try {
 updateSpaceConsumed(path, 0, dsNew-dsOld);
 ...
 {code}
 So in the above example we started with diskspace at 384mb (3 * 128mb) and 
 then we subtract 375mb (to reflect only 9mb raw was actually used) twice so 
 for each file the diskspace for the directory is - 366mb (384mb minus 2 * 
 375mb). Which is why the quota gets negative and yet we can still write more 
 files.
 So a directory with lots of single block files (if you have multiple blocks 
 on the final partial block ends up subtracting from the diskspace used) ends 
 up having a quota that's way off.
 I think the fix is to in FSDirectory#replaceNode not have the 
 diskspaceConsumed calculations differ when the old and new INode have the 
 same blocks. I'll work on a patch which also adds a quota test for blocks 
 that are not multiples of the block size and warns in 
 INodeDirectory#computeContentSummary if the computed size does not reflect 
 the cached value.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save


[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032692#comment-13032692
 ] 

Matt Foley commented on HDFS-1505:
--

Hi Aaron, agree with you that storage directories of type IMAGE_AND_EDITS are a 
distinct NameNodeDirType.  However, my understanding of 
NNStorage.getNumStorageDirs(NameNodeDirType), and NameNodeDirType.isOfType() is 
that membership queries (iterators or counts) about storage dirs of type EDITS 
return answers relating to all storage dirs of type EDITS || IMAGE_AND_EDITS, 
while queries about storage dirs of type IMAGE return answers relating to all 
storage dirs of type IMAGE || IMAGE_AND_EDITS.  That is, isOfType() is 
permissive rather than exclusive.

I could be wrong of course :-) as it's possible I didn't correctly follow 
overloaded implementations. Please let me know if so.  Thanks.


 saveNamespace appears to succeed even if all directories fail to save
 -

 Key: HDFS-1505
 URL: https://issues.apache.org/jira/browse/HDFS-1505
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, 
 hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch


 After HDFS-1071, saveNamespace now appears to succeed even if all of the 
 individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

[
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032697#comment-13032697
]

Aaron T. Myers commented on HDFS-1505:
--

bq. That is, isOfType() is permissive rather than exclusive.

You are quite correct. My mistake. The original logic you posted for the check
seems to be correct.

bq. Also, since HDFS-1826 copied the concurrent saveNamespace() logic into
FSImage.doUpgrade(), would you please add the same code fragment to the end of
doUpgrade(), and a corresponding corruption unit test case to TestDFSUpgrade?
Thanks.

It occurs to me now that the failure handling should perhaps be different
between these two cases. i.e. it is acceptable to tolerate some number of
storage directory failures during save namespace, but we should perhaps throw
an error in the event *any* storage directories fail during upgrade. Thoughts?

saveNamespace appears to succeed even if all directories fail to save
-

Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch,
hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch

After HDFS-1071, saveNamespace now appears to succeed even if all of the
individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1917) Clean up duplication of dependent jar files

[
https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032695#comment-13032695
]

Hadoop QA commented on HDFS-1917:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12479007/HDFS-1917.patch
against trunk revision 1102153.

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these core unit tests:
org.apache.hadoop.cli.TestHDFSCLI

org.apache.hadoop.hdfs.server.namenode.TestBlocksWithNotEnoughRacks
org.apache.hadoop.hdfs.TestDFSShell
org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
org.apache.hadoop.hdfs.TestFileConcurrentReader
org.apache.hadoop.tools.TestJMXGet

+1 contrib tests. The patch passed contrib unit tests.

-1 system test framework. The patch failed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/506//testReport/
Findbugs warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/506//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/506//console

This message is automatically generated.

Clean up duplication of dependent jar files
---

Key: HDFS-1917
URL: https://issues.apache.org/jira/browse/HDFS-1917
Project: Hadoop HDFS
Issue Type: Bug
Components: build
Affects Versions: 0.23.0
Environment: Java 6, RHEL 5.5
Reporter: Eric Yang
Assignee: Eric Yang
Attachments: HDFS-1917-1.patch, HDFS-1917.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1814) HDFS portion of HADOOP-7214 - Hadoop /usr/bin/groups equivalent


[ 
https://issues.apache.org/jira/browse/HDFS-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032698#comment-13032698
 ] 

Aaron T. Myers commented on HDFS-1814:
--

The test failures are unrelated to this patch.

 HDFS portion of HADOOP-7214 - Hadoop /usr/bin/groups equivalent
 ---

 Key: HDFS-1814
 URL: https://issues.apache.org/jira/browse/HDFS-1814
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client, name-node
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: hdfs-1814.0.txt, hdfs-1814.1.txt, hdfs-1814.2.txt, 
 hdfs-1814.3.patch, hdfs-1814.4.patch, hdfs-1814.5.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-12 Thread Bharath Mundlapudi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032701#comment-13032701
 ] 

Bharath Mundlapudi commented on HDFS-1592:
--

Thanks for the review, Jitendra.

1. The conditions are there for better readability. Yes, we can change this 
into one condition.

2. Error is logged where the exception is caught.

 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1899) GenericTestUtils.formatNamenode is misplaced


 [ 
https://issues.apache.org/jira/browse/HDFS-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1899:
--

Status: Patch Available  (was: Open)

 GenericTestUtils.formatNamenode is misplaced
 

 Key: HDFS-1899
 URL: https://issues.apache.org/jira/browse/HDFS-1899
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Ted Yu
  Labels: newbie
 Fix For: 0.23.0

 Attachments: HDFS-1899.patch


 This function belongs in DFSTestUtil, the standard place for putting 
 cluster-related utils.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save


[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032705#comment-13032705
 ] 

Matt Foley commented on HDFS-1505:
--

Good question.  I don't know.  Let's both ask our ops teams.

 saveNamespace appears to succeed even if all directories fail to save
 -

 Key: HDFS-1505
 URL: https://issues.apache.org/jira/browse/HDFS-1505
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, 
 hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch


 After HDFS-1071, saveNamespace now appears to succeed even if all of the 
 individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format

[
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032714#comment-13032714
]

Suresh Srinivas commented on HDFS-1905:
---

we don't want to pollute the mental model of HDFS for new users by making
them understand cluster IDs, block pools, etc.
I disagree with you on this. Cluster ID though is being added as part of
federation, I do not think pollutes the mental model.

What is the cluster today? It is all the nodes sharing the same namespaceID,
that is automatically generated and shared by all the nodes. Cluster ID makes
it much cleaner where user identifiable name is shared by all the nodes and
will identify all the nodes in the cluster. I am not sure if this is such a
complicated idea that disrupts the HDFS model.

Further, even without federation, we should have had such an identifier in the
first place, instead of namespaceID, which happened to become cluster ID
equivalent.

Improve the usability of namenode -format
--

The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart


 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1921:
-

Status: Patch Available  (was: Open)

 Save namespace can cause NN to be unable to come up on restart
 --

 Key: HDFS-1921
 URL: https://issues.apache.org/jira/browse/HDFS-1921
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Aaron T. Myers
Assignee: Matt Foley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0

 Attachments: hdfs1921_v23.patch


 I discovered this in the course of trying to implement a fix for HDFS-1505.
 Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
 namespace proceeds in the following order:
 # rename current to lastcheckpoint.tmp for all of them,
 # save image and recreate edits for all of them,
 # rename lastcheckpoint.tmp to previous.checkpoint.
 The problem is that step 3 occurs regardless of whether or not an error 
 occurs for all storage directories in step 2. Upon restart, the NN will see 
 non-existent or corrupt {{current}} directories, and no 
 {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
 directories are not formatted.
 This issue appears to be present on both 0.22 and 0.23. This should arguably 
 be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart


 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1921:
-

Attachment: hdfs1921_v23.patch

Here's a patch for trunk, so it will run under auto-test.  I'll post the v22 
version when it passes.

The HDFS-1505 test case should work if this patch is added.  Can you please try 
it, as I was getting a failure to unlock the storage dir upon 
FSNamesystem.close().

 Save namespace can cause NN to be unable to come up on restart
 --

 Key: HDFS-1921
 URL: https://issues.apache.org/jira/browse/HDFS-1921
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Aaron T. Myers
Assignee: Matt Foley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0

 Attachments: hdfs1921_v23.patch


 I discovered this in the course of trying to implement a fix for HDFS-1505.
 Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
 namespace proceeds in the following order:
 # rename current to lastcheckpoint.tmp for all of them,
 # save image and recreate edits for all of them,
 # rename lastcheckpoint.tmp to previous.checkpoint.
 The problem is that step 3 occurs regardless of whether or not an error 
 occurs for all storage directories in step 2. Upon restart, the NN will see 
 non-existent or corrupt {{current}} directories, and no 
 {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
 directories are not formatted.
 This issue appears to be present on both 0.22 and 0.23. This should arguably 
 be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format

[
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032719#comment-13032719
]

Todd Lipcon commented on HDFS-1905:
---

I agree that cluster ID is a nicer construct than namespace ID. But it doesn't
replace it, since we still have the namespaceID in NNStorage.

Perhaps a nice compromise would be the following:
- hadoop namenode -format gains a required argument for cluster ID. ie
hadoop namenode -format mycluster. If you don't specify this it should print
usage info.
- hadoop namenode -upgrade by default will carry over the old namespaceID as
the new cluster's cluster ID? Alternatively one may provide a cluster ID with
hadoop namenode -upgrade -clusterid foo?

Another question: if cluster ID is meant to be a user-visible nice name --
how can one rename a cluster?

Improve the usability of namenode -format
--

The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1917) Clean up duplication of dependent jar files

[
https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032725#comment-13032725
]

Hadoop QA commented on HDFS-1917:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12479013/HDFS-1917-1.patch
against trunk revision 1102467.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these core unit tests:
org.apache.hadoop.cli.TestHDFSCLI
org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
org.apache.hadoop.hdfs.TestFileConcurrentReader
org.apache.hadoop.tools.TestJMXGet

+1 contrib tests. The patch passed contrib unit tests.

+1 system test framework. The patch passed system test framework compile.

Test results:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/507//testReport/
Findbugs warnings:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/507//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/507//console

This message is automatically generated.

Clean up duplication of dependent jar files
---

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart


 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1921:
-

Attachment: hdfs-1505-1-test.txt

Here's the modified form of the test that works - there was a glitch in spy 
storage setup.  The test passes.

 Save namespace can cause NN to be unable to come up on restart
 --

 Key: HDFS-1921
 URL: https://issues.apache.org/jira/browse/HDFS-1921
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Aaron T. Myers
Assignee: Matt Foley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0

 Attachments: hdfs-1505-1-test.txt, hdfs1921_v23.patch


 I discovered this in the course of trying to implement a fix for HDFS-1505.
 Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
 namespace proceeds in the following order:
 # rename current to lastcheckpoint.tmp for all of them,
 # save image and recreate edits for all of them,
 # rename lastcheckpoint.tmp to previous.checkpoint.
 The problem is that step 3 occurs regardless of whether or not an error 
 occurs for all storage directories in step 2. Upon restart, the NN will see 
 non-existent or corrupt {{current}} directories, and no 
 {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
 directories are not formatted.
 This issue appears to be present on both 0.22 and 0.23. This should arguably 
 be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1787) Not enough xcievers error should propagate to client