[jira] [Commented] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with symlink error.

2014-04-11 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966272#comment-13966272
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6233:
---

Please also change HardLink.createHardLinkMult so that if the command fails, 
include hardLinkCommand to the exception message.

 Datanode upgrade in Windows from 1.x to 2.4 fails with symlink error.
 -

 Key: HDFS-6233
 URL: https://issues.apache.org/jira/browse/HDFS-6233
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, tools
Affects Versions: 2.4.0
 Environment: Windows
Reporter: Huan Huang
Assignee: Arpit Agarwal
 Attachments: HDFS-6233.01.patch


 I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to 
 hard link exception.
 Repro steps:
 *Installed Hadoop 1.x
 *hadoop dfsadmin -safemode enter
 *hadoop dfsadmin -saveNamespace
 *hadoop namenode -finalize
 *Stop all services
 *Uninstall Hadoop 1.x 
 *Install Hadoop 2.4 
 *Start namenode with -upgrade option
 *Try to start datanode, begin to see Hardlink exception in datanode service 
 log.
 {code}
 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 8010: starting
 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server 
 Responder: starting
 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Data-node version: -55 and name-node layout version: -56
 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost
 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade
 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Upgrading storage directory d:\hadoop\data\hdfs\dn.
old LV = -44; old CTime = 0.
new LV = -55; new CTime = 1397168400373
 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory 
 d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current
 2014-04-10 22:47:12,254 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (Datanode Uuid unassigned) service to 
 myhost/10.0.0.1:8020
 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect 
 command line arguments.
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479)
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815)
   at java.lang.Thread.run(Thread.java:722)
 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Ending block pool service for: Block pool registering (Datanode Uuid 
 unassigned) service to myhost/10.0.0.1:8020
 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Block pool ID needed, but service not yet registered with NN
 java.lang.Exception: trace
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837)
   

[jira] [Created] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.

2014-04-11 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-6234:
---

 Summary: TestDatanodeConfig#testMemlockLimit fails on Windows due 
to invalid file path.
 Key: HDFS-6234
 URL: https://issues.apache.org/jira/browse/HDFS-6234
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, test
Affects Versions: 2.4.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial


{{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} due 
to an invalid URI configured in {{dfs.datanode.data.dir}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.

2014-04-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6234:


Status: Patch Available  (was: Open)

 TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
 --

 Key: HDFS-6234
 URL: https://issues.apache.org/jira/browse/HDFS-6234
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, test
Affects Versions: 2.4.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-6234.1.patch


 {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} 
 due to an invalid URI configured in {{dfs.datanode.data.dir}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.

2014-04-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6234:


Attachment: HDFS-6234.1.patch

I'm attaching a patch that sets a valid URI in {{dfs.datanode.data.dir}}.  
While I was in here, I also made some minor changes to make sure every created 
{{DataNode}} gets shut down.  I ran the test successfully on Mac and Windows 
with this patch.

 TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
 --

 Key: HDFS-6234
 URL: https://issues.apache.org/jira/browse/HDFS-6234
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-6234.1.patch


 {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} 
 due to an invalid URI configured in {{dfs.datanode.data.dir}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6232:


Attachment: HDFS-6232.patch

 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Attachments: HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966282#comment-13966282
 ] 

Akira AJISAKA commented on HDFS-6232:
-

I reproduced the error. It occurs because {{XMLUtils.addSaxString}} can't 
handle null ACL entry name. The name is an optional value, so it can be null.
I attached a patch to add null check.

 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Attachments: HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6232:


Status: Patch Available  (was: Open)

 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.4.0, 3.0.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Attachments: HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.

2014-04-11 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-6235:
---

 Summary: TestFileJournalManager can fail on Windows due to file 
locking if tests run out of order.
 Key: HDFS-6235
 URL: https://issues.apache.org/jira/browse/HDFS-6235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Affects Versions: 2.4.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth


{{TestFileJournalManager}} has multiple tests that reuse the same storage 
directory: /filejournaltest2.  The last test in the suite intentionally leaves 
a file open to test behavior of an unclosed edit log.  On some environments 
though, tests within a suite execute out of order.  In this case, a lock is 
still held on /filejournaltest2, and subsequent tests fail trying to delete the 
directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.

2014-04-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6235:


Attachment: HDFS-6235.1.patch

The easiest thing to do here is simply to make sure that each test in the suite 
uses a unique storage directory.  That way, there is no chance of collision on 
locked files between multiple tests in the suite.  At the end of the test 
suite, all of these file handles will get released automatically during process 
exit.  I'm attaching a patch that changes the storage directory names to match 
the names of the individual tests.

 TestFileJournalManager can fail on Windows due to file locking if tests run 
 out of order.
 -

 Key: HDFS-6235
 URL: https://issues.apache.org/jira/browse/HDFS-6235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-6235.1.patch


 {{TestFileJournalManager}} has multiple tests that reuse the same storage 
 directory: /filejournaltest2.  The last test in the suite intentionally 
 leaves a file open to test behavior of an unclosed edit log.  On some 
 environments though, tests within a suite execute out of order.  In this 
 case, a lock is still held on /filejournaltest2, and subsequent tests fail 
 trying to delete the directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.

2014-04-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6235:


Priority: Trivial  (was: Major)

 TestFileJournalManager can fail on Windows due to file locking if tests run 
 out of order.
 -

 Key: HDFS-6235
 URL: https://issues.apache.org/jira/browse/HDFS-6235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-6235.1.patch


 {{TestFileJournalManager}} has multiple tests that reuse the same storage 
 directory: /filejournaltest2.  The last test in the suite intentionally 
 leaves a file open to test behavior of an unclosed edit log.  On some 
 environments though, tests within a suite execute out of order.  In this 
 case, a lock is still held on /filejournaltest2, and subsequent tests fail 
 trying to delete the directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.

2014-04-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6235:


Status: Patch Available  (was: Open)

 TestFileJournalManager can fail on Windows due to file locking if tests run 
 out of order.
 -

 Key: HDFS-6235
 URL: https://issues.apache.org/jira/browse/HDFS-6235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Affects Versions: 2.4.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-6235.1.patch


 {{TestFileJournalManager}} has multiple tests that reuse the same storage 
 directory: /filejournaltest2.  The last test in the suite intentionally 
 leaves a file open to test behavior of an unclosed edit log.  On some 
 environments though, tests within a suite execute out of order.  In this 
 case, a lock is still held on /filejournaltest2, and subsequent tests fail 
 trying to delete the directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.

2014-04-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966298#comment-13966298
 ] 

Jing Zhao commented on HDFS-6235:
-

+1 pending Jenkins.

 TestFileJournalManager can fail on Windows due to file locking if tests run 
 out of order.
 -

 Key: HDFS-6235
 URL: https://issues.apache.org/jira/browse/HDFS-6235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-6235.1.patch


 {{TestFileJournalManager}} has multiple tests that reuse the same storage 
 directory: /filejournaltest2.  The last test in the suite intentionally 
 leaves a file open to test behavior of an unclosed edit log.  On some 
 environments though, tests within a suite execute out of order.  In this 
 case, a lock is still held on /filejournaltest2, and subsequent tests fail 
 trying to delete the directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6229) Race condition in failover can cause RetryCache fail to work

2014-04-11 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966308#comment-13966308
 ] 

Suresh Srinivas commented on HDFS-6229:
---

+1 for the patch. Thanks Jing for working on this. 

 Race condition in failover can cause RetryCache fail to work
 

 Key: HDFS-6229
 URL: https://issues.apache.org/jira/browse/HDFS-6229
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.1.0-beta
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-6229.000.patch, retrycache-race.patch


 Currently when NN failover happens, the old SBN first sets its state to 
 active, then starts the active services (including tailing all the remaining 
 editlog and building a complete retry cache based on the editlog). If a retry 
 request, which has already succeeded in the old ANN (but the client fails to 
 receive the response), comes in between, this retry may still get served by 
 the new ANN but miss the retry cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.

2014-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966348#comment-13966348
 ] 

Hadoop QA commented on HDFS-6234:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639740/HDFS-6234.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6650//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6650//console

This message is automatically generated.

 TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
 --

 Key: HDFS-6234
 URL: https://issues.apache.org/jira/browse/HDFS-6234
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-6234.1.patch


 {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} 
 due to an invalid URI configured in {{dfs.datanode.data.dir}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966358#comment-13966358
 ] 

Hadoop QA commented on HDFS-6232:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639741/HDFS-6232.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1293 javac 
compiler warnings (more than the trunk's current 1287 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6651//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6651//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6651//console

This message is automatically generated.

 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Attachments: HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6110) adding more slow action log in critical write path

2014-04-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966361#comment-13966361
 ] 

Colin Patrick McCabe commented on HDFS-6110:


{code}
+  public static final int DFS_SLOW_IO_WARNING_THRESHOLD_DEFAULT = 300;
{code}

It's odd that this is an int, given that we retrieve the threshold as a long 
later on.  This seems likely to lead to confusion-- can we just make this a 
long everywhere?

+1 after that's addressed

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, 
 HDFS-6110v4.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.

2014-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966367#comment-13966367
 ] 

Hadoop QA commented on HDFS-6235:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639744/HDFS-6235.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.datanode.fsdataset.TestAvailableSpaceVolumeChoosingPolicy

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6652//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6652//console

This message is automatically generated.

 TestFileJournalManager can fail on Windows due to file locking if tests run 
 out of order.
 -

 Key: HDFS-6235
 URL: https://issues.apache.org/jira/browse/HDFS-6235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-6235.1.patch


 {{TestFileJournalManager}} has multiple tests that reuse the same storage 
 directory: /filejournaltest2.  The last test in the suite intentionally 
 leaves a file open to test behavior of an unclosed edit log.  On some 
 environments though, tests within a suite execute out of order.  In this 
 case, a lock is still held on /filejournaltest2, and subsequent tests fail 
 trying to delete the directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6231) DFSClient hangs infinitely if using hedged reads and all eligible datanodes die.

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966406#comment-13966406
 ] 

Hudson commented on HDFS-6231:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #537 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/537/])
HDFS-6231. DFSClient hangs infinitely if using hedged reads and all eligible 
datanodes die. Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586551)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java


 DFSClient hangs infinitely if using hedged reads and all eligible datanodes 
 die.
 

 Key: HDFS-6231
 URL: https://issues.apache.org/jira/browse/HDFS-6231
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0, 2.4.1

 Attachments: HDFS-6231.1.patch


 When using hedged reads, and all eligible datanodes for the read get flagged 
 as dead or ignored, then the client is supposed to refetch block locations 
 from the NameNode to retry the read.  Instead, we've seen that the client can 
 hang indefinitely.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6224) Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966416#comment-13966416
 ] 

Hudson commented on HDFS-6224:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #537 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/537/])
Undo accidental FSNamesystem change introduced in HDFS-6224 commit. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586515)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
HDFS-6224. Add a unit test to TestAuditLogger for file permissions passed to 
logAuditEvent. Contributed by Charles Lamb. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586490)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java


 Add a unit test to TestAuditLogger for file permissions passed to 
 logAuditEvent
 ---

 Key: HDFS-6224
 URL: https://issues.apache.org/jira/browse/HDFS-6224
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6224.001.patch, HDFS-6224.002.patch, 
 HDFS-6224.003.patch, HDFS-6224.004.patch


 Add a unit test which verifies behavior of HADOOP-9155. Specifically, ensure 
 that during a setPermission operation the permission returned is the one that 
 was just set, not the permission before the operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5669) Storage#tryLock() should check for null before logging successfull message

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966411#comment-13966411
 ] 

Hudson commented on HDFS-5669:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #537 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/537/])
HDFS-5669. Storage#tryLock() should check for null before logging successfull 
message. Contributed by Vinayakumar B (umamahesh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586392)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java


 Storage#tryLock() should check for null before logging successfull message
 --

 Key: HDFS-5669
 URL: https://issues.apache.org/jira/browse/HDFS-5669
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-5669.patch, HDFS-5669.patch


 In the following code in Storage#tryLock(), there is a possibility that 
 {{file.getChannel().tryLock()}} returns null if the lock is acquired by some 
 other process. In that case even though return value is null, a successfull 
 message confuses.
 {code}try {
 res = file.getChannel().tryLock();
 file.write(jvmName.getBytes(Charsets.UTF_8));
 LOG.info(Lock on  + lockF +  acquired by nodename  + jvmName);
   } catch(OverlappingFileLockException oe) {{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6203) check other namenode's state before HAadmin transitionToActive

2014-04-11 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-6203.
--

Resolution: Duplicate

 check other namenode's state before HAadmin transitionToActive
 --

 Key: HDFS-6203
 URL: https://issues.apache.org/jira/browse/HDFS-6203
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha
Affects Versions: 2.3.0
Reporter: patrick white
Assignee: Kihwal Lee

 Current behavior is that the HAadmin -transitionToActive command will 
 complete the transition to Active even if the other namenode is already in 
 Active state. This is not an allowed condition and should be handled by 
 fencing, however setting both namenode's active can happen accidentally with 
 relative ease, especially in a production environment when performing manual 
 maintenance operations. 
 If this situation does occur it is very serious and will likely cause data 
 loss, or best case, require a difficult recovery to avoid data loss.
 This is requesting an enhancement to haadmin's -transitionToActive command, 
 to have HAadmin check the Active state of the other namenode before 
 completing the transition. If the other namenode is Active, then fail the 
 request due to other nn already-active.
 Not sure if there is a scenario where both namenode's being Active is valid 
 or desired, but to maintain functional compatibility a 'force' parameter 
 could be added to  override this check and allow previous behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-04-11 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-2949:


Assignee: Kihwal Lee

 HA: Add check to active state transition to prevent operator-induced split 
 brain
 

 Key: HDFS-2949
 URL: https://issues.apache.org/jira/browse/HDFS-2949
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: Kihwal Lee

 Currently, if the administrator mistakenly calls -transitionToActive on one 
 NN while the other one is still active, all hell will break loose. We can add 
 a simple check by having the NN make a getServiceState() RPC to its peer with 
 a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
 node is active, it should refuse to enter active mode. If the RPC fails or 
 indicates standby, it can proceed.
 This is just meant as a preventative safety check - we still expect users to 
 use the -failover command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-04-11 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-2949:
-

Target Version/s: 2.5.0  (was: 0.24.0)

 HA: Add check to active state transition to prevent operator-induced split 
 brain
 

 Key: HDFS-2949
 URL: https://issues.apache.org/jira/browse/HDFS-2949
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: Kihwal Lee

 Currently, if the administrator mistakenly calls -transitionToActive on one 
 NN while the other one is still active, all hell will break loose. We can add 
 a simple check by having the NN make a getServiceState() RPC to its peer with 
 a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
 node is active, it should refuse to enter active mode. If the RPC fails or 
 indicates standby, it can proceed.
 This is just meant as a preventative safety check - we still expect users to 
 use the -failover command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966479#comment-13966479
 ] 

Akira AJISAKA commented on HDFS-6232:
-

The javac warnings in the test code are due to the use of 
{{com.sun.org.apache.xml.internal.serialize.XMLSerializer}} and 
{{com.sun.org.apache.xml.internal.serialize.OutputFormat}}. They are already 
used by OfflineEditsViewer.
I suggest to use Xerces instead.

 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Attachments: HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6224) Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966505#comment-13966505
 ] 

Hudson commented on HDFS-6224:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1729 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1729/])
Undo accidental FSNamesystem change introduced in HDFS-6224 commit. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586515)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
HDFS-6224. Add a unit test to TestAuditLogger for file permissions passed to 
logAuditEvent. Contributed by Charles Lamb. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586490)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java


 Add a unit test to TestAuditLogger for file permissions passed to 
 logAuditEvent
 ---

 Key: HDFS-6224
 URL: https://issues.apache.org/jira/browse/HDFS-6224
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6224.001.patch, HDFS-6224.002.patch, 
 HDFS-6224.003.patch, HDFS-6224.004.patch


 Add a unit test which verifies behavior of HADOOP-9155. Specifically, ensure 
 that during a setPermission operation the permission returned is the one that 
 was just set, not the permission before the operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966582#comment-13966582
 ] 

Akira AJISAKA commented on HDFS-6232:
-

bq. I suggest to use Xerces instead.
I tried to use {{org.apache.xml.serialize.XMLSerializer}} in Apache Xerces, but 
it was  deprecated. I'm thinking we should use another library or write our own 
code.

 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Attachments: HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5669) Storage#tryLock() should check for null before logging successfull message

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966613#comment-13966613
 ] 

Hudson commented on HDFS-5669:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1754 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1754/])
HDFS-5669. Storage#tryLock() should check for null before logging successfull 
message. Contributed by Vinayakumar B (umamahesh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586392)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java


 Storage#tryLock() should check for null before logging successfull message
 --

 Key: HDFS-5669
 URL: https://issues.apache.org/jira/browse/HDFS-5669
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-5669.patch, HDFS-5669.patch


 In the following code in Storage#tryLock(), there is a possibility that 
 {{file.getChannel().tryLock()}} returns null if the lock is acquired by some 
 other process. In that case even though return value is null, a successfull 
 message confuses.
 {code}try {
 res = file.getChannel().tryLock();
 file.write(jvmName.getBytes(Charsets.UTF_8));
 LOG.info(Lock on  + lockF +  acquired by nodename  + jvmName);
   } catch(OverlappingFileLockException oe) {{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6224) Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966618#comment-13966618
 ] 

Hudson commented on HDFS-6224:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1754 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1754/])
Undo accidental FSNamesystem change introduced in HDFS-6224 commit. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586515)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
HDFS-6224. Add a unit test to TestAuditLogger for file permissions passed to 
logAuditEvent. Contributed by Charles Lamb. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586490)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java


 Add a unit test to TestAuditLogger for file permissions passed to 
 logAuditEvent
 ---

 Key: HDFS-6224
 URL: https://issues.apache.org/jira/browse/HDFS-6224
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.5.0

 Attachments: HDFS-6224.001.patch, HDFS-6224.002.patch, 
 HDFS-6224.003.patch, HDFS-6224.004.patch


 Add a unit test which verifies behavior of HADOOP-9155. Specifically, ensure 
 that during a setPermission operation the permission returned is the one that 
 was just set, not the permission before the operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6231) DFSClient hangs infinitely if using hedged reads and all eligible datanodes die.

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966608#comment-13966608
 ] 

Hudson commented on HDFS-6231:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1754 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1754/])
HDFS-6231. DFSClient hangs infinitely if using hedged reads and all eligible 
datanodes die. Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586551)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java


 DFSClient hangs infinitely if using hedged reads and all eligible datanodes 
 die.
 

 Key: HDFS-6231
 URL: https://issues.apache.org/jira/browse/HDFS-6231
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0, 2.4.1

 Attachments: HDFS-6231.1.patch


 When using hedged reads, and all eligible datanodes for the read get flagged 
 as dead or ignored, then the client is supposed to refetch block locations 
 from the NameNode to retry the read.  Instead, we've seen that the client can 
 hang indefinitely.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths

2014-04-11 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966638#comment-13966638
 ] 

Daryn Sharp commented on HDFS-6143:
---

bq. [...] It may not work since the seek offset is missing in open. There is no 
way to calculate the redirect.

I have an internal 0.23 patch that I'm reworking for 2.x that will actually use 
the block locations for opens - HDFS-6221.  The current incarnation fetches 
block locations for just the offset when the http connection stream is opened, 
but it could easily be changed to fetch all the block locations when the 
webhdfs open is called - which will elicit a FNF - and then use the locations 
for a given offset when the http connection stream is opened.

bq. Short term, looking at the ByteRangeInputStream, it's inefficient in that 
for even a single byte forward seek (seek(getPos()+1), it closes the connection 
and re-opens it [...] read-ahead for short range seeks, which is a lot more 
efficient

Yes, I've already tinkered with fixing this very problem.  Internally we found 
that a fraction of jobs actually perform seeks after the split offset seek, and 
those that did seek would only do so maybe 1-2 times so it was deemed a low 
priority fix.

 WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
 paths
 

 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
 HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
 HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
 HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch


 WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
 non-existing paths. 
 - 'open', does not really open anything, i.e., it does not contact the 
 server, and therefore cannot discover FileNotFound, it's deferred until next 
 read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
 get ENOENT on open. 
 [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
  is an example of the code that's broken because of this.
 - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
 instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966655#comment-13966655
 ] 

Hudson commented on HDFS-6234:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5500 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5500/])
HDFS-6234. TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid 
file path. Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586682)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeConfig.java


 TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
 --

 Key: HDFS-6234
 URL: https://issues.apache.org/jira/browse/HDFS-6234
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-6234.1.patch


 {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} 
 due to an invalid URI configured in {{dfs.datanode.data.dir}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6214) Webhdfs has poor throughput for files 2GB

2014-04-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966656#comment-13966656
 ] 

Kihwal Lee commented on HDFS-6214:
--

Related discussion at
  
http://stackoverflow.com/questions/9031311/slow-transfers-in-jetty-with-chunked-transfer-encoding-at-certain-buffer-size

So, if io.file.buffer.size is small enough, like 4K (the default), it may be 
overall slower, but there will be no difference for files  2GB.  Do you know 
what the response buffer size is for this type of webhdfs responses from 
datanodes? 




 Webhdfs has poor throughput for files 2GB
 --

 Key: HDFS-6214
 URL: https://issues.apache.org/jira/browse/HDFS-6214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-6214.patch


 For the DN's open call, jetty returns a Content-Length header for files 2GB, 
 and uses chunking for files 2GB.  A bug in jetty's buffer handling results 
 in a ~8X reduction in throughput.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.

2014-04-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6234:


   Resolution: Fixed
Fix Version/s: 2.4.1
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I committed this to trunk, branch-2 and branch-2.4.  Thanks for the review, 
Jing!

 TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
 --

 Key: HDFS-6234
 URL: https://issues.apache.org/jira/browse/HDFS-6234
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Fix For: 3.0.0, 2.4.1

 Attachments: HDFS-6234.1.patch


 {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} 
 due to an invalid URI configured in {{dfs.datanode.data.dir}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.

2014-04-11 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1393#comment-1393
 ] 

Chris Nauroth commented on HDFS-6235:
-

The failure in {{TestAvailableSpaceVolumeChoosingPolicy}} looks unrelated, and 
I can't repro it.  I'm going to commit this.

 TestFileJournalManager can fail on Windows due to file locking if tests run 
 out of order.
 -

 Key: HDFS-6235
 URL: https://issues.apache.org/jira/browse/HDFS-6235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Attachments: HDFS-6235.1.patch


 {{TestFileJournalManager}} has multiple tests that reuse the same storage 
 directory: /filejournaltest2.  The last test in the suite intentionally 
 leaves a file open to test behavior of an unclosed edit log.  On some 
 environments though, tests within a suite execute out of order.  In this 
 case, a lock is still held on /filejournaltest2, and subsequent tests fail 
 trying to delete the directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6214) Webhdfs has poor throughput for files 2GB

2014-04-11 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1392#comment-1392
 ] 

Kihwal Lee commented on HDFS-6214:
--

Is flush() necessary for the non-chunked case? Wouldn't it hurt performance in 
some cases?

 Webhdfs has poor throughput for files 2GB
 --

 Key: HDFS-6214
 URL: https://issues.apache.org/jira/browse/HDFS-6214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-6214.patch


 For the DN's open call, jetty returns a Content-Length header for files 2GB, 
 and uses chunking for files 2GB.  A bug in jetty's buffer handling results 
 in a ~8X reduction in throughput.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.

2014-04-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6235:


   Resolution: Fixed
Fix Version/s: 2.4.1
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Jing, thank you for the code review.  I committed this to trunk, branch-2 and 
branch-2.4.

 TestFileJournalManager can fail on Windows due to file locking if tests run 
 out of order.
 -

 Key: HDFS-6235
 URL: https://issues.apache.org/jira/browse/HDFS-6235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Fix For: 3.0.0, 2.4.1

 Attachments: HDFS-6235.1.patch


 {{TestFileJournalManager}} has multiple tests that reuse the same storage 
 directory: /filejournaltest2.  The last test in the suite intentionally 
 leaves a file open to test behavior of an unclosed edit log.  On some 
 environments though, tests within a suite execute out of order.  In this 
 case, a lock is still held on /filejournaltest2, and subsequent tests fail 
 trying to delete the directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966677#comment-13966677
 ] 

Hudson commented on HDFS-6235:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5502 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5502/])
HDFS-6235. TestFileJournalManager can fail on Windows due to file locking if 
tests run out of order. Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586692)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java


 TestFileJournalManager can fail on Windows due to file locking if tests run 
 out of order.
 -

 Key: HDFS-6235
 URL: https://issues.apache.org/jira/browse/HDFS-6235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial
 Fix For: 3.0.0, 2.4.1

 Attachments: HDFS-6235.1.patch


 {{TestFileJournalManager}} has multiple tests that reuse the same storage 
 directory: /filejournaltest2.  The last test in the suite intentionally 
 leaves a file open to test behavior of an unclosed edit log.  On some 
 environments though, tests within a suite execute out of order.  In this 
 case, a lock is still held on /filejournaltest2, and subsequent tests fail 
 trying to delete the directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6189) Multiple HDFS tests fail on Windows attempting to use a test root path containing a colon.

2014-04-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966682#comment-13966682
 ] 

Colin Patrick McCabe commented on HDFS-6189:


Thanks for fixing this, [~cnauroth].

 Multiple HDFS tests fail on Windows attempting to use a test root path 
 containing a colon.
 --

 Key: HDFS-6189
 URL: https://issues.apache.org/jira/browse/HDFS-6189
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.4.1

 Attachments: HDFS-6189.1.patch


 Some HDFS tests are attempting to use a test root path based on the 
 test.root.dir that we've defined for use on the local file system.  This 
 doesn't work on Windows because of the drive spec, i.e. C:.  HDFS rejects 
 paths containing a colon as invalid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6168) Remove deprecated methods in DistributedFileSystem

2014-04-11 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966692#comment-13966692
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6168:
---

When talking about API compatibility, it is only about public APIs but not 
private (or LimitedPrivate) APIs.  DistributedFileSystem is similar to 
DFSClient.  They are not public APIs.  User applications should not use them 
directly.  If they do, they should expect to change their code across releases 
since these classes are unstable.  BTW, the methods removed in this JIRA were 
deprecated for a long time.

 Remove deprecated methods in DistributedFileSystem
 --

 Key: HDFS-6168
 URL: https://issues.apache.org/jira/browse/HDFS-6168
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Fix For: 2.5.0

 Attachments: h6168_20140327.patch, h6168_20140327b.patch


 Some methods in DistributedFileSystem are already deprecated for a long time. 
  They should be removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered

2014-04-11 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966723#comment-13966723
 ] 

Mit Desai commented on HDFS-2734:
-

I see that that there is no activity on this Jira since a long time. 
[~andreina], Is this still reproducible on your side? If this is still an 
issue, can you provide the information [~qwertymaniac] requested?
For the analysis that Harsh did, I think this is not reproducable on his side 
and I have not seen anyone else raising this concern. In that case, if I do not 
hear back by 4/17/14, I will go ahead and close this issue as Not A Problem.

-Mit

 Even if we configure the property fs.checkpoint.size in both core-site.xml 
 and hdfs-site.xml  the values are not been considered
 

 Key: HDFS-2734
 URL: https://issues.apache.org/jira/browse/HDFS-2734
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.20.1, 0.23.0
Reporter: J.Andreina
Priority: Minor

 Even if we configure the property fs.checkpoint.size in both core-site.xml 
 and hdfs-site.xml  the values are not been considered



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6229) Race condition in failover can cause RetryCache fail to work

2014-04-11 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6229:


   Resolution: Fixed
Fix Version/s: 2.4.1
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the review, Suresh! I've committed this to trunk, branch-2, and 
branch-2.4.

 Race condition in failover can cause RetryCache fail to work
 

 Key: HDFS-6229
 URL: https://issues.apache.org/jira/browse/HDFS-6229
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.1.0-beta
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.4.1

 Attachments: HDFS-6229.000.patch, retrycache-race.patch


 Currently when NN failover happens, the old SBN first sets its state to 
 active, then starts the active services (including tailing all the remaining 
 editlog and building a complete retry cache based on the editlog). If a retry 
 request, which has already succeeded in the old ANN (but the client fails to 
 receive the response), comes in between, this retry may still get served by 
 the new ANN but miss the retry cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6229) Race condition in failover can cause RetryCache fail to work

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966777#comment-13966777
 ] 

Hudson commented on HDFS-6229:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5503 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5503/])
HDFS-6229. Race condition in failover can cause RetryCache fail to work. 
Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586714)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RetryCache.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java


 Race condition in failover can cause RetryCache fail to work
 

 Key: HDFS-6229
 URL: https://issues.apache.org/jira/browse/HDFS-6229
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.1.0-beta
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.4.1

 Attachments: HDFS-6229.000.patch, retrycache-race.patch


 Currently when NN failover happens, the old SBN first sets its state to 
 active, then starts the active services (including tailing all the remaining 
 editlog and building a complete retry cache based on the editlog). If a retry 
 request, which has already succeeded in the old ANN (but the client fails to 
 receive the response), comes in between, this retry may still get served by 
 the new ANN but miss the retry cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966843#comment-13966843
 ] 

Colin Patrick McCabe commented on HDFS-6232:


It looks like you are trying to make {{XMLUtils#addSaxString}} treat a {{null}} 
value as equal to the empty string.  This seems confusing to me, since when we 
read the XML file and create edits again, we can't distinguish between null and 
the empty string.

While it's possible that null and empty really are interchangeable here, I 
would rather leave it to the caller to make this determination.  Why not just 
do the simple thing and fix the ACL code so it doesn't pass {{null}} to this 
function?

bq. I tried to use org.apache.xml.serialize.XMLSerializer in Apache Xerces, but 
it was deprecated. I'm thinking we should use another library or write our own 
code.

Yeah.  The deprecation warnings aren't relevant to this JIRA.  Check out 
HDFS-4629 if you're interested in a solution to that.

 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Attachments: HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6227) Short circuit read failed due to ClosedChannelException

2014-04-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966878#comment-13966878
 ] 

Colin Patrick McCabe commented on HDFS-6227:


bq. I quickly checked the code, in ShortCircuitCache#unref, we close the 
replica when the ref count is 0, but I did not find the corresponding code to 
remove the replica object. Thus is it possible that the cause of the issue is a 
closed ShortCircuitReplica object was still retrieved from the 
ShortCircuitCache and used for reading? Colin Patrick McCabe, could you provide 
some input?

That should not be possible.  See this code in {{ShortCircuitCache#unref}}:

{code}
  int newRefCount = --replica.refCount;
  if (newRefCount == 0) {
// Close replica, since there are no remaining references to it.
Preconditions.checkArgument(replica.purged,
Replica  + replica +  reached a refCount of 0 without  +
being purged);
replica.close();
{code}

Notice that we would throw a precondition exception if the replica hadn't been 
purged.  (Purged means that it has been removed from the cache and will not 
be handed out to new readers.)  There is no other path to calling 
{{ShortCircuitReplica#close}}.

Can you say a little bit more about the platform and version that you saw this 
on?  Can you reproduce it?  Also, are there any other messages in the log?

 Short circuit read failed due to ClosedChannelException
 ---

 Key: HDFS-6227
 URL: https://issues.apache.org/jira/browse/HDFS-6227
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jing Zhao

 While running tests in a single node cluster, where short circuit read is 
 enabled and multiple threads may read the same file concurrently, one of the 
 read got ClosedChannelException and failed. Full exception trace see comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3128) Unit tests should not use a test root in /tmp

2014-04-11 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966914#comment-13966914
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3128:
---

 However, other parts of this change altered where tests created files in HDFS 
 MiniDFSClusters. These parts were not needed,...

These changes actually are incorrect since test.build.data is a conf for the 
local file system.  It should not be used in HDFS MiniDFSClusters anyway.

 Overall, I think this problem is bound to keep occurring until we get Windows 
 build slaves. ...

Sure, if there were Windows Jenkins builds, the bug would be caught earlier.   
Or if the contributors/reviewers could spend more time to understand the code 
first, the bug would also be caught.  Simply searching and replacing the code 
without first understanding them is something that we should avoid.

 Unit tests should not use a test root in /tmp
 -

 Key: HDFS-3128
 URL: https://issues.apache.org/jira/browse/HDFS-3128
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.3.0
Reporter: Eli Collins
Assignee: Andrew Wang
Priority: Minor
 Fix For: 2.4.0

 Attachments: hdfs-3128-1.patch


 Saw this on jenkins, TestResolveHdfsSymlink#testFcResolveAfs creates 
 /tmp/alpha which interferes with other executors on the same machine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-241) ch{mod,own,grp} -R to do recursion at the name node

2014-04-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966929#comment-13966929
 ] 

Colin Patrick McCabe commented on HDFS-241:
---

+1 for closing this.  The more recursive / long-running RPCs we support, the 
harder it will be to scale the NN.

 ch{mod,own,grp} -R to do recursion at the name node
 ---

 Key: HDFS-241
 URL: https://issues.apache.org/jira/browse/HDFS-241
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Robert Chansler

 Performance. No need to maintain {{distch}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6208) DataNode caching can leak file descriptors.

2014-04-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966952#comment-13966952
 ] 

Colin Patrick McCabe commented on HDFS-6208:


thanks for this fix, Chris

 DataNode caching can leak file descriptors.
 ---

 Key: HDFS-6208
 URL: https://issues.apache.org/jira/browse/HDFS-6208
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0, 2.4.1

 Attachments: HDFS-6208.1.patch


 In the DataNode, management of mmap'd/mlock'd block files can leak file 
 descriptors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6232:


Attachment: HDFS-6232.2.patch

 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Attachments: HDFS-6232.2.patch, HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966977#comment-13966977
 ] 

Akira AJISAKA commented on HDFS-6232:
-

Thanks Colin for the comment. I updated the patch to distinguish null and the 
empty string.
bq. Yeah. The deprecation warnings aren't relevant to this JIRA. Check out 
HDFS-4629 if you're interested in a solution to that.
Thanks again for the linking.

 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Attachments: HDFS-6232.2.patch, HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.

2014-04-11 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6233:
--

Summary: Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink 
error.  (was: Datanode upgrade in Windows from 1.x to 2.4 fails with symlink 
error.)

 Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
 --

 Key: HDFS-6233
 URL: https://issues.apache.org/jira/browse/HDFS-6233
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, tools
Affects Versions: 2.4.0
 Environment: Windows
Reporter: Huan Huang
Assignee: Arpit Agarwal
 Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch


 I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to 
 hard link exception.
 Repro steps:
 *Installed Hadoop 1.x
 *hadoop dfsadmin -safemode enter
 *hadoop dfsadmin -saveNamespace
 *hadoop namenode -finalize
 *Stop all services
 *Uninstall Hadoop 1.x 
 *Install Hadoop 2.4 
 *Start namenode with -upgrade option
 *Try to start datanode, begin to see Hardlink exception in datanode service 
 log.
 {code}
 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 8010: starting
 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server 
 Responder: starting
 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Data-node version: -55 and name-node layout version: -56
 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost
 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade
 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Upgrading storage directory d:\hadoop\data\hdfs\dn.
old LV = -44; old CTime = 0.
new LV = -55; new CTime = 1397168400373
 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory 
 d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current
 2014-04-10 22:47:12,254 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (Datanode Uuid unassigned) service to 
 myhost/10.0.0.1:8020
 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect 
 command line arguments.
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479)
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815)
   at java.lang.Thread.run(Thread.java:722)
 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Ending block pool service for: Block pool registering (Datanode Uuid 
 unassigned) service to myhost/10.0.0.1:8020
 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Block pool ID needed, but service not yet registered with NN
 java.lang.Exception: trace
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837)
  

[jira] [Commented] (HDFS-6180) dead node count / listing is very broken in JMX and old GUI

2014-04-11 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967118#comment-13967118
 ] 

Mohammad Kamrul Islam commented on HDFS-6180:
-

bq. The changes turn out to be a lot bigger than I anticipated. It might be 
risky to put it in at the very last moment. Moving it to a blocker of 2.5.0.


What about for the release 2.4.1? It could be coming soon.


 dead node count / listing is very broken in JMX and old GUI
 ---

 Key: HDFS-6180
 URL: https://issues.apache.org/jira/browse/HDFS-6180
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Travis Thompson
Assignee: Haohui Mai
Priority: Blocker
 Fix For: 2.5.0

 Attachments: HDFS-6180.000.patch, HDFS-6180.001.patch, 
 HDFS-6180.002.patch, HDFS-6180.003.patch, HDFS-6180.004.patch, dn.log


 After bringing up a 578 node cluster with 13 dead nodes, 0 were reported on 
 the new GUI, but showed up properly in the datanodes tab.  Some nodes are 
 also being double reported in the deadnode and inservice section (22 show up 
 dead, 565 show up alive, 9 duplicated nodes). 
 From /jmx (confirmed that it's the same in jconsole):
 {noformat}
 {
 name : Hadoop:service=NameNode,name=FSNamesystemState,
 modelerType : org.apache.hadoop.hdfs.server.namenode.FSNamesystem,
 CapacityTotal : 5477748687372288,
 CapacityUsed : 24825720407,
 CapacityRemaining : 5477723861651881,
 TotalLoad : 565,
 SnapshotStats : {\SnapshottableDirectories\:0,\Snapshots\:0},
 BlocksTotal : 21065,
 MaxObjects : 0,
 FilesTotal : 25454,
 PendingReplicationBlocks : 0,
 UnderReplicatedBlocks : 0,
 ScheduledReplicationBlocks : 0,
 FSState : Operational,
 NumLiveDataNodes : 565,
 NumDeadDataNodes : 0,
 NumDecomLiveDataNodes : 0,
 NumDecomDeadDataNodes : 0,
 NumDecommissioningDataNodes : 0,
 NumStaleDataNodes : 1
   },
 {noformat}
 I'm not going to include deadnode/livenodes because the list is huge, but 
 I've confirmed there are 9 nodes showing up in both deadnodes and livenodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.

2014-04-11 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967145#comment-13967145
 ] 

Arpit Agarwal commented on HDFS-6233:
-

+1 from me, however perhaps it will be appropriate for another committer to +1 
it too.

I've tested the updated patch on OS X and JDK and it fixes the hang.

 Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
 --

 Key: HDFS-6233
 URL: https://issues.apache.org/jira/browse/HDFS-6233
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, tools
Affects Versions: 2.4.0
 Environment: Windows
Reporter: Huan Huang
Assignee: Arpit Agarwal
 Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch


 I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to 
 hard link exception.
 Repro steps:
 *Installed Hadoop 1.x
 *hadoop dfsadmin -safemode enter
 *hadoop dfsadmin -saveNamespace
 *hadoop namenode -finalize
 *Stop all services
 *Uninstall Hadoop 1.x 
 *Install Hadoop 2.4 
 *Start namenode with -upgrade option
 *Try to start datanode, begin to see Hardlink exception in datanode service 
 log.
 {code}
 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 8010: starting
 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server 
 Responder: starting
 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Data-node version: -55 and name-node layout version: -56
 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost
 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade
 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Upgrading storage directory d:\hadoop\data\hdfs\dn.
old LV = -44; old CTime = 0.
new LV = -55; new CTime = 1397168400373
 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory 
 d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current
 2014-04-10 22:47:12,254 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (Datanode Uuid unassigned) service to 
 myhost/10.0.0.1:8020
 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect 
 command line arguments.
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479)
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815)
   at java.lang.Thread.run(Thread.java:722)
 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Ending block pool service for: Block pool registering (Datanode Uuid 
 unassigned) service to myhost/10.0.0.1:8020
 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Block pool ID needed, but service not yet registered with NN
 java.lang.Exception: trace
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619)
   at 
 

[jira] [Commented] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.

2014-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967147#comment-13967147
 ] 

Hadoop QA commented on HDFS-6233:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639865/HDFS-6233.02.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6654//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6654//console

This message is automatically generated.

 Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
 --

 Key: HDFS-6233
 URL: https://issues.apache.org/jira/browse/HDFS-6233
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, tools
Affects Versions: 2.4.0
 Environment: Windows
Reporter: Huan Huang
Assignee: Arpit Agarwal
 Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch


 I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to 
 hard link exception.
 Repro steps:
 *Installed Hadoop 1.x
 *hadoop dfsadmin -safemode enter
 *hadoop dfsadmin -saveNamespace
 *hadoop namenode -finalize
 *Stop all services
 *Uninstall Hadoop 1.x 
 *Install Hadoop 2.4 
 *Start namenode with -upgrade option
 *Try to start datanode, begin to see Hardlink exception in datanode service 
 log.
 {code}
 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 8010: starting
 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server 
 Responder: starting
 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Data-node version: -55 and name-node layout version: -56
 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost
 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade
 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Upgrading storage directory d:\hadoop\data\hdfs\dn.
old LV = -44; old CTime = 0.
new LV = -55; new CTime = 1397168400373
 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory 
 d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current
 2014-04-10 22:47:12,254 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (Datanode Uuid unassigned) service to 
 myhost/10.0.0.1:8020
 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect 
 command line arguments.
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479)
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)
   at 
 

[jira] [Commented] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.

2014-04-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967153#comment-13967153
 ] 

Jing Zhao commented on HDFS-6233:
-

+1 the patch looks good to me. Thanks for the fix [~cnauroth] and 
[~arpitagarwal]!

 Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
 --

 Key: HDFS-6233
 URL: https://issues.apache.org/jira/browse/HDFS-6233
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, tools
Affects Versions: 2.4.0
 Environment: Windows
Reporter: Huan Huang
Assignee: Arpit Agarwal
 Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch


 I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to 
 hard link exception.
 Repro steps:
 *Installed Hadoop 1.x
 *hadoop dfsadmin -safemode enter
 *hadoop dfsadmin -saveNamespace
 *hadoop namenode -finalize
 *Stop all services
 *Uninstall Hadoop 1.x 
 *Install Hadoop 2.4 
 *Start namenode with -upgrade option
 *Try to start datanode, begin to see Hardlink exception in datanode service 
 log.
 {code}
 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 8010: starting
 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server 
 Responder: starting
 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Data-node version: -55 and name-node layout version: -56
 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost
 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade
 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Upgrading storage directory d:\hadoop\data\hdfs\dn.
old LV = -44; old CTime = 0.
new LV = -55; new CTime = 1397168400373
 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory 
 d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current
 2014-04-10 22:47:12,254 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (Datanode Uuid unassigned) service to 
 myhost/10.0.0.1:8020
 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect 
 command line arguments.
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479)
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815)
   at java.lang.Thread.run(Thread.java:722)
 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Ending block pool service for: Block pool registering (Datanode Uuid 
 unassigned) service to myhost/10.0.0.1:8020
 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Block pool ID needed, but service not yet registered with NN
 java.lang.Exception: trace
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837)
   at java.lang.Thread.run(Thread.java:722)
 

[jira] [Comment Edited] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.

2014-04-11 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967145#comment-13967145
 ] 

Arpit Agarwal edited comment on HDFS-6233 at 4/11/14 10:05 PM:
---

+1 from me, however perhaps it will be appropriate for another committer to +1 
it too.

I've tested the updated patch on OS X and Windows and it fixes the hang.


was (Author: arpitagarwal):
+1 from me, however perhaps it will be appropriate for another committer to +1 
it too.

I've tested the updated patch on OS X and JDK and it fixes the hang.

 Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
 --

 Key: HDFS-6233
 URL: https://issues.apache.org/jira/browse/HDFS-6233
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, tools
Affects Versions: 2.4.0
 Environment: Windows
Reporter: Huan Huang
Assignee: Arpit Agarwal
 Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch


 I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to 
 hard link exception.
 Repro steps:
 *Installed Hadoop 1.x
 *hadoop dfsadmin -safemode enter
 *hadoop dfsadmin -saveNamespace
 *hadoop namenode -finalize
 *Stop all services
 *Uninstall Hadoop 1.x 
 *Install Hadoop 2.4 
 *Start namenode with -upgrade option
 *Try to start datanode, begin to see Hardlink exception in datanode service 
 log.
 {code}
 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 8010: starting
 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server 
 Responder: starting
 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Data-node version: -55 and name-node layout version: -56
 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost
 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade
 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Upgrading storage directory d:\hadoop\data\hdfs\dn.
old LV = -44; old CTime = 0.
new LV = -55; new CTime = 1397168400373
 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory 
 d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current
 2014-04-10 22:47:12,254 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (Datanode Uuid unassigned) service to 
 myhost/10.0.0.1:8020
 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect 
 command line arguments.
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479)
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815)
   at java.lang.Thread.run(Thread.java:722)
 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Ending block pool service for: Block pool registering (Datanode Uuid 
 unassigned) service to myhost/10.0.0.1:8020
 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Block pool ID needed, but service not yet registered with NN
 java.lang.Exception: trace
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859)
   at 
 

[jira] [Updated] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6232:
---

   Resolution: Fixed
Fix Version/s: 2.4.1
   Status: Resolved  (was: Patch Available)

committed.  Thanks, Akira.

 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Fix For: 2.4.1

 Attachments: HDFS-6232.2.patch, HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications

2014-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967213#comment-13967213
 ] 

Hudson commented on HDFS-6232:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5507 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5507/])
HDFS-6232. OfflineEditsViewer throws a NPE on edits containing ACL 
modifications (ajisakaa via cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586790)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java


 OfflineEditsViewer throws a NPE on edits containing ACL modifications
 -

 Key: HDFS-6232
 URL: https://issues.apache.org/jira/browse/HDFS-6232
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.4.0
Reporter: Stephen Chu
Assignee: Akira AJISAKA
 Fix For: 2.4.1

 Attachments: HDFS-6232.2.patch, HDFS-6232.patch


 The OfflineEditsViewer using the XML parser will through a NPE when using an 
 edit with a SET_ACL op.
 {code}
 [root@hdfs-nfs current]# hdfs oev -i 
 edits_001-007 -o fsedits.out
 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got 
 RuntimeException at position 505
 Encountered exception. Exiting: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122)
   at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
 [root@hdfs-nfs current]# 
 {code}
 This is reproducible by setting an acl on a file and then running the OEV on 
 the editsinprogress file.
 The stats and binary parsers run OK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-4114) Remove the BackupNode and CheckpointNode from trunk

2014-04-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967252#comment-13967252
 ] 

Jing Zhao edited comment on HDFS-4114 at 4/11/14 11:50 PM:
---

bq. There are a couple TODOs that require changes in the protobuf files. I'll 
clean them up in a subsequent jira.
Sounds good to me. There are still a couple of methods that need to be cleaned, 
such as endCheckpoint. But we can do it in subsequent jiras. The 001 patch 
looks good to me as a first step. +1.



was (Author: jingzhao):
bq. There are a couple TODOs that require changes in the protobuf files. I'll 
clean them up in a subsequent jira.
Sounds good to me. There are still a couple of methods that need to be cleaned, 
such as endCheckpoint. But we can do it in subsequence jiras. The 001 patch 
looks good to me as a first step. +1.


 Remove the BackupNode and CheckpointNode from trunk
 ---

 Key: HDFS-4114
 URL: https://issues.apache.org/jira/browse/HDFS-4114
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Suresh Srinivas
 Attachments: HDFS-4114.000.patch, HDFS-4114.001.patch, HDFS-4114.patch


 Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the 
 BackupNode and CheckpointNode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4114) Remove the BackupNode and CheckpointNode from trunk

2014-04-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967252#comment-13967252
 ] 

Jing Zhao commented on HDFS-4114:
-

bq. There are a couple TODOs that require changes in the protobuf files. I'll 
clean them up in a subsequent jira.
Sounds good to me. There are still a couple of methods that need to be cleaned, 
such as endCheckpoint. But we can do it in subsequence jiras. The 001 patch 
looks good to me as a first step. +1.


 Remove the BackupNode and CheckpointNode from trunk
 ---

 Key: HDFS-4114
 URL: https://issues.apache.org/jira/browse/HDFS-4114
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Suresh Srinivas
 Attachments: HDFS-4114.000.patch, HDFS-4114.001.patch, HDFS-4114.patch


 Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the 
 BackupNode and CheckpointNode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4114) Remove the BackupNode and CheckpointNode from trunk

2014-04-11 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967261#comment-13967261
 ] 

Haohui Mai commented on HDFS-4114:
--

Thanks [~jingzhao] for the review. I'll commit it to trunk this weekend.

 Remove the BackupNode and CheckpointNode from trunk
 ---

 Key: HDFS-4114
 URL: https://issues.apache.org/jira/browse/HDFS-4114
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Suresh Srinivas
 Attachments: HDFS-4114.000.patch, HDFS-4114.001.patch, HDFS-4114.patch


 Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the 
 BackupNode and CheckpointNode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6236) ImageServlet should use Time#monotonicNow to measure latency.

2014-04-11 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-6236:
---

 Summary: ImageServlet should use Time#monotonicNow to measure 
latency.
 Key: HDFS-6236
 URL: https://issues.apache.org/jira/browse/HDFS-6236
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


{{ImageServlet}} currently uses {{Time#now}} to get file transfer latency 
measurements and pass them to the metrics system.  It would be preferrable to 
use {{Time#monotonicNow}} so that we're using the most precise available system 
timer, and we're not subject to odd bugs that could result in negative latency 
measurements, like resetting the system clock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6236) ImageServlet should use Time#monotonicNow to measure latency.

2014-04-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6236:


Attachment: HDFS-6236.1.patch

I noticed this due to a failure in {{TestCheckpoint#testCheckpoint}} on 
Windows.  There are assertions that the metrics are greater than zero, and 
these assertions were failing.  Windows in particular has a low-precision 
implementation of {{System#currentTimeMillis}}.  This patch fixes the issue.  
Tests are passing consistently for me on both Mac and Windows now.

I also discovered a problem inside 
{{TestCheckpoint#testReformatNNBetweenCheckpoints}}.  Restarting a new 
{{MiniDFSCluster}} ended up trying to use the same storage directory as a 
{{SecondaryNameNode}} that was intentionally left running in the background.  
The new cluster would fail during initialization due to file locks while trying 
to delete the storage directory.  To solve this, I've cloned the configuration 
and set a different storage dir for use by the 2NN.


 ImageServlet should use Time#monotonicNow to measure latency.
 -

 Key: HDFS-6236
 URL: https://issues.apache.org/jira/browse/HDFS-6236
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-6236.1.patch


 {{ImageServlet}} currently uses {{Time#now}} to get file transfer latency 
 measurements and pass them to the metrics system.  It would be preferrable to 
 use {{Time#monotonicNow}} so that we're using the most precise available 
 system timer, and we're not subject to odd bugs that could result in negative 
 latency measurements, like resetting the system clock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6236) ImageServlet should use Time#monotonicNow to measure latency.

2014-04-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6236:


Status: Patch Available  (was: Open)

 ImageServlet should use Time#monotonicNow to measure latency.
 -

 Key: HDFS-6236
 URL: https://issues.apache.org/jira/browse/HDFS-6236
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-6236.1.patch


 {{ImageServlet}} currently uses {{Time#now}} to get file transfer latency 
 measurements and pass them to the metrics system.  It would be preferrable to 
 use {{Time#monotonicNow}} so that we're using the most precise available 
 system timer, and we're not subject to odd bugs that could result in negative 
 latency measurements, like resetting the system clock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6236) ImageServlet should use Time#monotonicNow to measure latency.

2014-04-11 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967284#comment-13967284
 ] 

Haohui Mai commented on HDFS-6236:
--

+1 pending jenkins.

 ImageServlet should use Time#monotonicNow to measure latency.
 -

 Key: HDFS-6236
 URL: https://issues.apache.org/jira/browse/HDFS-6236
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-6236.1.patch


 {{ImageServlet}} currently uses {{Time#now}} to get file transfer latency 
 measurements and pass them to the metrics system.  It would be preferrable to 
 use {{Time#monotonicNow}} so that we're using the most precise available 
 system timer, and we're not subject to odd bugs that could result in negative 
 latency measurements, like resetting the system clock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.

2014-04-11 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967341#comment-13967341
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6233:
---

If there are bugs in getLinkMultArgLength but the bugs do not affect upgrade, 
we may fix them separately.

 Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
 --

 Key: HDFS-6233
 URL: https://issues.apache.org/jira/browse/HDFS-6233
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, tools
Affects Versions: 2.4.0
 Environment: Windows
Reporter: Huan Huang
Assignee: Arpit Agarwal
 Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch


 I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to 
 hard link exception.
 Repro steps:
 *Installed Hadoop 1.x
 *hadoop dfsadmin -safemode enter
 *hadoop dfsadmin -saveNamespace
 *hadoop namenode -finalize
 *Stop all services
 *Uninstall Hadoop 1.x 
 *Install Hadoop 2.4 
 *Start namenode with -upgrade option
 *Try to start datanode, begin to see Hardlink exception in datanode service 
 log.
 {code}
 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 8010: starting
 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server 
 Responder: starting
 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Data-node version: -55 and name-node layout version: -56
 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost
 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade
 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Upgrading storage directory d:\hadoop\data\hdfs\dn.
old LV = -44; old CTime = 0.
new LV = -55; new CTime = 1397168400373
 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory 
 d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current
 2014-04-10 22:47:12,254 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (Datanode Uuid unassigned) service to 
 myhost/10.0.0.1:8020
 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect 
 command line arguments.
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479)
   at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815)
   at java.lang.Thread.run(Thread.java:722)
 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Ending block pool service for: Block pool registering (Datanode Uuid 
 unassigned) service to myhost/10.0.0.1:8020
 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Block pool ID needed, but service not yet registered with NN
 java.lang.Exception: trace
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143)
   at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837)
   

[jira] [Commented] (HDFS-6236) ImageServlet should use Time#monotonicNow to measure latency.

2014-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967367#comment-13967367
 ] 

Hadoop QA commented on HDFS-6236:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639894/HDFS-6236.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6655//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6655//console

This message is automatically generated.

 ImageServlet should use Time#monotonicNow to measure latency.
 -

 Key: HDFS-6236
 URL: https://issues.apache.org/jira/browse/HDFS-6236
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-6236.1.patch


 {{ImageServlet}} currently uses {{Time#now}} to get file transfer latency 
 measurements and pass them to the metrics system.  It would be preferrable to 
 use {{Time#monotonicNow}} so that we're using the most precise available 
 system timer, and we're not subject to odd bugs that could result in negative 
 latency measurements, like resetting the system clock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)