[jira] [Commented] (HDFS-2726) Exception in createBlockOutputStream shouldn't delete exception stack trace

2011-12-29 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177057#comment-13177057
 ] 

Hudson commented on HDFS-2726:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1497 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1497/])
HDFS-2726. Fix a logging issue under DFSClient's createBlockOutputStream 
method (harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225456
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java


 Exception in createBlockOutputStream shouldn't delete exception stack trace
 -

 Key: HDFS-2726
 URL: https://issues.apache.org/jira/browse/HDFS-2726
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Michael Bieniosek
Assignee: Harsh J
 Fix For: 0.24.0

 Attachments: HDFS-2726.patch


 I'm occasionally (1/5000 times) getting this error after upgrading everything 
 to hadoop-0.18:
 08/09/09 03:28:36 INFO dfs.DFSClient: Exception in createBlockOutputStream 
 java.io.IOException: Could not read from stream
 08/09/09 03:28:36 INFO dfs.DFSClient: Abandoning block 
 blk_624229997631234952_8205908
 DFSClient contains the logging code:
 LOG.info(Exception in createBlockOutputStream  + ie);
 This would be better written with ie as the second argument to LOG.info, so 
 that the stack trace could be preserved.  As it is, I don't know how to start 
 debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2394) Add tests for Namenode active standby states

2011-12-29 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177063#comment-13177063
 ] 

Suresh Srinivas commented on HDFS-2394:
---

You are right. Existing tests cover this.

 Add tests for Namenode active standby states
 

 Key: HDFS-2394
 URL: https://issues.apache.org/jira/browse/HDFS-2394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node, test
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2394) Add tests for Namenode active standby states

2011-12-29 Thread Suresh Srinivas (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-2394.
---

Resolution: Invalid

 Add tests for Namenode active standby states
 

 Key: HDFS-2394
 URL: https://issues.apache.org/jira/browse/HDFS-2394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node, test
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-4) DF should use used + available as the capacity of this volume

2011-12-29 Thread Harsh J (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177079#comment-13177079
 ] 

Harsh J commented on HDFS-4:


Are there any other disadvantages you can think of in going with 
used+available? Any edge cases where that sum may be incorrect to use?

 DF should use used + available as the capacity of this volume
 -

 Key: HDFS-4
 URL: https://issues.apache.org/jira/browse/HDFS-4
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: UNIX
Reporter: Rong-En Fan
  Labels: newbie

 Generally speaking, UNIX tends to keep certain percentage of disk space 
 reserved for root used only (can be changed via tune2fs or when mkfs). 
 Therefore, Hadoop's DF class should not use the 1st number in df output as 
 the capacity of this volume. Instead, it should use used+available as its 
 capacity.
 Otherwise, datanode may think this volume is not full but in fact it is.
 The code in question is src/core/org/apache/hadoop/fs/DF.java, method 
 parseExecResult()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-21) unresponsive namenode because of not finding places to replicate

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-21.
-

Resolution: Won't Fix

This is a clear effect of tweaking dfs.replication.min. You want your HDFS to 
guarantee X replicas before file is closed, and that's what it will do.

Resolving as Won't Fix.

 unresponsive namenode because of not finding places to replicate
 

 Key: HDFS-21
 URL: https://issues.apache.org/jira/browse/HDFS-21
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Christian Kunz

 We have a 80 node cluster where many nodes started to fail such it went down 
 to 59 live nodes. Originally we had our set of applications 60 times 
 replicated. The cluster size went below the required replication number, and 
 started to become increasingly less responsive, spewing out the following 
 messages at a high rate:
 WARN org.apache.hadoop.fs.FSNamesystem: Not able to place enough replicas, 
 still in need of 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-10) DFS logging in NameSystem.pendingTransfer consumes all disk space

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-10.
-

Resolution: Won't Fix

These things help ops determine HDFS activity. If you do not wish to see them 
ever, you may turn up the logging to a WARN or higher level. Its INFO by 
default.

Resolving as Won't Fix, as these things are useful and yet not too much info to 
be DEBUG-only.

 DFS logging in NameSystem.pendingTransfer consumes all disk space
 -

 Key: HDFS-10
 URL: https://issues.apache.org/jira/browse/HDFS-10
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Michael Bieniosek

 Sometimes the namenode goes crazy.  I see this in my logs:
 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
 NameSystem.pendingTransfer: ask x.y.z.243:50010 to replicate 
 blk_-9064654741761822118 to datanode(s) x.y.z.247:50010
 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
 NameSystem.pendingTransfer: ask x.y.z.243:50010 to replicate 
 blk_-8996500637974689840 to datanode(s) x.y.yz.225:50010
 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
 NameSystem.pendingTransfer: ask x.y.z.227:50010 to replicate 
 blk_-8870980160272831217 to datanode(s) x.y.z.244:50010
 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
 NameSystem.pendingTransfer: ask x.y.z.227:50010 to replicate 
 blk_-8721101562083234290 to datanode(s) x.y.z.250:50010
 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
 NameSystem.pendingTransfer: ask x.y.z.250:50010 to replicate 
 blk_-9044741671491162229 to datanode(s) x.y.z.244:50010
 There are on the order of 10k/sec until the machine runs out of disk space.
 I notice that in FSNamesystem.java, about 10 lines above this line is logged, 
 there is a comment:
 //
 // Move the block-replication into a pending state.
 // The reason we use 'pending' is so we can retry
 // replications that fail after an appropriate amount of time.
 // (REMIND - mjc - this timer is not yet implemented.)
 //

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2727) hdfs.c uses deprecated property dfs.block.size

2011-12-29 Thread Sho Shimauchi (Created) (JIRA)
hdfs.c uses deprecated property dfs.block.size
--

 Key: HDFS-2727
 URL: https://issues.apache.org/jira/browse/HDFS-2727
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 0.23.0
Reporter: Sho Shimauchi
Priority: Minor


hdfs.c uses deprecated property dfs.block.size.
It should use new property dfs.blocksize instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-9) distcp job failed

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-9.


Resolution: Incomplete

This could've very well been a transient issue. Lets open a new JIRA if this is 
too frequent. This one has gone stale over the versions.

 distcp job failed
 -

 Key: HDFS-9
 URL: https://issues.apache.org/jira/browse/HDFS-9
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Runping Qi

 I was running distcp to copy data from one dfs to another.
 The job failed with the following exception in the mappers:
 java.net.SocketException: Connection reset
   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1633)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1720)
   at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
   at 
 org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
   at 
 org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.copy(CopyFiles.java:305)
   at 
 org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:352)
   at 
 org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:217)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:195)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1750)
 I examined the data node logs of the target dfs. I saw a lot of exceptions 
 like:
 2007-10-12 15:04:09,109 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: 
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:1365)
 at 
 org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:897)
 at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:763)
 at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2727) hdfs.c uses deprecated property dfs.block.size

2011-12-29 Thread Harsh J (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177095#comment-13177095
 ] 

Harsh J commented on HDFS-2727:
---

It should not rely on properties for dfs.blocksize and dfs.replication, and 
instead fetch those from the jFS object itself, via the getDefaultBlockSize and 
getDefaultReplication API calls. This will help avoid maintenance in future :)

 hdfs.c uses deprecated property dfs.block.size
 --

 Key: HDFS-2727
 URL: https://issues.apache.org/jira/browse/HDFS-2727
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 0.23.0
Reporter: Sho Shimauchi
Priority: Minor

 hdfs.c uses deprecated property dfs.block.size.
 It should use new property dfs.blocksize instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Sho Shimauchi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sho Shimauchi updated HDFS-1314:


Attachment: hdfs-1314.txt

attached

* revert hdfs.c
* add more info to hdfs-default.xml and cluster_setup.xml

 dfs.block.size accepts only absolute value
 --

 Key: HDFS-1314
 URL: https://issues.apache.org/jira/browse/HDFS-1314
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Karim Saadah
Assignee: Sho Shimauchi
Priority: Minor
  Labels: newbie
 Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt


 Using dfs.block.size=8388608 works 
 but dfs.block.size=8mb does not.
 Using dfs.block.size=8mb should throw some WARNING on NumberFormatException.
 (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-67) /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-67.
-

Resolution: Not A Problem

Not a problem after Dhruba's HDFS-1707.

 /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly
 ---

 Key: HDFS-67
 URL: https://issues.apache.org/jira/browse/HDFS-67
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Benjamin Francisoud
 Attachments: patch-DFSClient-HADOOP-2561.diff


 Diretory /tmp/hadoop-${user}/dfs/tmp/tmp is being filled with those kinfd 
 of files: client-226966559287638337420857.tmp
 I tried to look at the code and found:
 h3. DFSClient.java
 src/java/org/apache/hadoop/dfs/DFSClient.java
 {code:java}
 private void closeBackupStream() throws IOException {...}
 /* Similar to closeBackupStream(). Theoritically deleting a file
  * twice could result in deleting a file that we should not.
  */
 private void deleteBackupFile() {...}
 private File newBackupFile() throws IOException {
 String name = tmp + File.separator +
  client- + Math.abs(r.nextLong());
 File result = dirAllocator.createTmpFileForWrite(name,
2 * blockSize,
conf);
 return result;
 }
 {code}
 h3. LocalDirAllocator
 src/java/org/apache/hadoop/fs/LocalDirAllocator.java#AllocatorPerContext.java
 {code:java}
 /** Creates a file on the local FS. Pass size as -1 if not known apriori. We
  *  round-robin over the set of disks (via the configured dirs) and return
  *  a file on the first path which has enough space. The file is guaranteed
  *  to go away when the JVM exits.
  */
 public File createTmpFileForWrite(String pathStr, long size,
 Configuration conf) throws IOException {
 // find an appropriate directory
 Path path = getLocalPathForWrite(pathStr, size, conf);
 File dir = new File(path.getParent().toUri().getPath());
 String prefix = path.getName();
 // create a temp file on this directory
 File result = File.createTempFile(prefix, null, dir);
 result.deleteOnExit();
 return result;
 }
 {code}
 First it seems to me it's a bit of a mess here I don't know if it's 
 DFSClient.java#deleteBackupFile() or 
 LocalDirAllocator#createTmpFileForWrite() {deleteOnExit(); ) who is call ... 
 or both. Why not keep it dry and delete it only once.
 But the most important is the deleteOnExit(); since it mean if it is never 
 restarted it will never delete files :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-67) /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly

2011-12-29 Thread Harsh J (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177204#comment-13177204
 ] 

Harsh J commented on HDFS-67:
-

Er, make that HADOOP-1707 sorry.

 /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly
 ---

 Key: HDFS-67
 URL: https://issues.apache.org/jira/browse/HDFS-67
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Benjamin Francisoud
 Attachments: patch-DFSClient-HADOOP-2561.diff


 Diretory /tmp/hadoop-${user}/dfs/tmp/tmp is being filled with those kinfd 
 of files: client-226966559287638337420857.tmp
 I tried to look at the code and found:
 h3. DFSClient.java
 src/java/org/apache/hadoop/dfs/DFSClient.java
 {code:java}
 private void closeBackupStream() throws IOException {...}
 /* Similar to closeBackupStream(). Theoritically deleting a file
  * twice could result in deleting a file that we should not.
  */
 private void deleteBackupFile() {...}
 private File newBackupFile() throws IOException {
 String name = tmp + File.separator +
  client- + Math.abs(r.nextLong());
 File result = dirAllocator.createTmpFileForWrite(name,
2 * blockSize,
conf);
 return result;
 }
 {code}
 h3. LocalDirAllocator
 src/java/org/apache/hadoop/fs/LocalDirAllocator.java#AllocatorPerContext.java
 {code:java}
 /** Creates a file on the local FS. Pass size as -1 if not known apriori. We
  *  round-robin over the set of disks (via the configured dirs) and return
  *  a file on the first path which has enough space. The file is guaranteed
  *  to go away when the JVM exits.
  */
 public File createTmpFileForWrite(String pathStr, long size,
 Configuration conf) throws IOException {
 // find an appropriate directory
 Path path = getLocalPathForWrite(pathStr, size, conf);
 File dir = new File(path.getParent().toUri().getPath());
 String prefix = path.getName();
 // create a temp file on this directory
 File result = File.createTempFile(prefix, null, dir);
 result.deleteOnExit();
 return result;
 }
 {code}
 First it seems to me it's a bit of a mess here I don't know if it's 
 DFSClient.java#deleteBackupFile() or 
 LocalDirAllocator#createTmpFileForWrite() {deleteOnExit(); ) who is call ... 
 or both. Why not keep it dry and delete it only once.
 But the most important is the deleteOnExit(); since it mean if it is never 
 restarted it will never delete files :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177315#comment-13177315
 ] 

Hudson commented on HDFS-2729:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1552 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1552/])
HDFS-2729. Update BlockManager's comments regarding the invalid block set 
(harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225591
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


 Update BlockManager's comments regarding the invalid block set
 --

 Key: HDFS-2729
 URL: https://issues.apache.org/jira/browse/HDFS-2729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.24.0

 Attachments: HDFS-2729.patch


 Looks like after HDFS-82 was covered at some point, the comments and logs 
 still carry presence of two sets when there really is just one set.
 This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-5) Check that network topology is updated when new data-nodes are joining the cluster

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-5.


Resolution: Cannot Reproduce

The mapping is done pretty much properly as far as I've noticed. With caching 
enabled though, one needs to restart the NN to get it in proper effect.

 Check that network topology is updated when new data-nodes are joining the 
 cluster
 --

 Key: HDFS-5
 URL: https://issues.apache.org/jira/browse/HDFS-5
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Konstantin Shvachko

 There is a suspicion that network topology is not updated if new racks are 
 added to the cluster. We should investigate and either confirm or rule out 
 this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-58) DistributedFileSystem.listPaths with some paths causes directory to be cleared

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-58.
-

Resolution: Cannot Reproduce

This has gone stale, and looking at listStatus impls, looks like it could not 
happen.

Can't reproduce, closing out.

 DistributedFileSystem.listPaths with some paths causes directory to be cleared
 --

 Key: HDFS-58
 URL: https://issues.apache.org/jira/browse/HDFS-58
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Linux
Reporter: Bryan Duxbury

 I am currently writing a Ruby wrapper to the Java DFS client libraries via 
 JNI. While attempting to test the listPaths method of the FileSystem class, I 
 discovered that passing a Path URI like hdfs://tf11:7276/user/rapleaf 
 results in the /user/rapleaf directory being cleared of all contents. A path 
 URI like hdfs://tf11:7276/user/rapleaf/* will list the contents of the 
 directory without damage. 
 I have verified this by creating directories and listing via the bin/hadoop 
 dfs -ls command. 
 Obviously passing an incorrectly formatted string a method that should be 
 read-only should not have destructive effects. Also, the actual required path 
 syntax for listings should be recorded in the documentation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-97) DFS should detect slow links(nodes) and avoid them

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-97.
-

Resolution: Not A Problem

We do tend to avoid highly loaded DataNodes (via xceiver counts) which may 
almost do the same operation.

Resolving as not a problem.

 DFS should detect slow links(nodes) and avoid them
 --

 Key: HDFS-97
 URL: https://issues.apache.org/jira/browse/HDFS-97
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Runping Qi

 The current DFS does not detect slow links (nodes).
 Thus, when a node or its network link is slow, it may affect the overall 
 system performance significantly.
 Specifically, when a map job needs to read data from such a node, it may 
 progress 10X slower.
 And when a DFS data node pipeline consists of such a node, the write 
 performance degrades significantly.
 This may lead to some long tails for map/reduce jobs. We have experienced 
 such behaviors quite often.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Harsh J (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177112#comment-13177112
 ] 

Harsh J commented on HDFS-1314:
---

Not sure why the patch failed. Perhaps its cause of the docs change in 
hadoop-common instead? Could you submit same patch without that change alone? 
I'll add back in later when committing (and will upload cumulative when am 
doing that).

 dfs.block.size accepts only absolute value
 --

 Key: HDFS-1314
 URL: https://issues.apache.org/jira/browse/HDFS-1314
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Karim Saadah
Assignee: Sho Shimauchi
Priority: Minor
  Labels: newbie
 Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt


 Using dfs.block.size=8388608 works 
 but dfs.block.size=8mb does not.
 Using dfs.block.size=8mb should throw some WARNING on NumberFormatException.
 (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177288#comment-13177288
 ] 

Hudson commented on HDFS-2729:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1501 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1501/])
HDFS-2729. Update BlockManager's comments regarding the invalid block set 
(harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225591
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


 Update BlockManager's comments regarding the invalid block set
 --

 Key: HDFS-2729
 URL: https://issues.apache.org/jira/browse/HDFS-2729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.24.0

 Attachments: HDFS-2729.patch


 Looks like after HDFS-82 was covered at some point, the comments and logs 
 still carry presence of two sets when there really is just one set.
 This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2726) Exception in createBlockOutputStream shouldn't delete exception stack trace

2011-12-29 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177174#comment-13177174
 ] 

Hudson commented on HDFS-2726:
--

Integrated in Hadoop-Mapreduce-trunk #942 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/942/])
HDFS-2726. Fix a logging issue under DFSClient's createBlockOutputStream 
method (harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225456
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java


 Exception in createBlockOutputStream shouldn't delete exception stack trace
 -

 Key: HDFS-2726
 URL: https://issues.apache.org/jira/browse/HDFS-2726
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Michael Bieniosek
Assignee: Harsh J
 Fix For: 0.24.0

 Attachments: HDFS-2726.patch


 I'm occasionally (1/5000 times) getting this error after upgrading everything 
 to hadoop-0.18:
 08/09/09 03:28:36 INFO dfs.DFSClient: Exception in createBlockOutputStream 
 java.io.IOException: Could not read from stream
 08/09/09 03:28:36 INFO dfs.DFSClient: Abandoning block 
 blk_624229997631234952_8205908
 DFSClient contains the logging code:
 LOG.info(Exception in createBlockOutputStream  + ie);
 This would be better written with ie as the second argument to LOG.info, so 
 that the stack trace could be preserved.  As it is, I don't know how to start 
 debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177317#comment-13177317
 ] 

Hudson commented on HDFS-2729:
--

Integrated in Hadoop-Common-trunk-Commit #1480 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1480/])
HDFS-2729. Update BlockManager's comments regarding the invalid block set 
(harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225591
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java


 Update BlockManager's comments regarding the invalid block set
 --

 Key: HDFS-2729
 URL: https://issues.apache.org/jira/browse/HDFS-2729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.24.0

 Attachments: HDFS-2729.patch


 Looks like after HDFS-82 was covered at some point, the comments and logs 
 still carry presence of two sets when there really is just one set.
 This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HDFS-97) DFS should detect slow links(nodes) and avoid them

2011-12-29 Thread Harsh J (Reopened) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HDFS-97:
-


Oh well, didn't notice the 'read' issue too. We cover writes with that, not 
reads. Reopening.

 DFS should detect slow links(nodes) and avoid them
 --

 Key: HDFS-97
 URL: https://issues.apache.org/jira/browse/HDFS-97
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Runping Qi

 The current DFS does not detect slow links (nodes).
 Thus, when a node or its network link is slow, it may affect the overall 
 system performance significantly.
 Specifically, when a map job needs to read data from such a node, it may 
 progress 10X slower.
 And when a DFS data node pipeline consists of such a node, the write 
 performance degrades significantly.
 This may lead to some long tails for map/reduce jobs. We have experienced 
 such behaviors quite often.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-36) Handling of deprecated dfs.info.bindAddress and dfs.info.port

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-36.
-

Resolution: Cannot Reproduce

Can't reproduce on 1.0+. Setting dfs.http(s).address suffices.

 Handling of deprecated dfs.info.bindAddress and dfs.info.port
 -

 Key: HDFS-36
 URL: https://issues.apache.org/jira/browse/HDFS-36
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Windows XP
Reporter: Cagdas Gerede
Priority: Minor

 When checkpointing is triggered in Secondary name node, Secondary name node 
 throws exception while it tries to connect to Namenode's http server in the 
 following two cases:
 1) In hadoop-site.xml, if you put only dfs.http.address but not 
 dfs.info.bindAddress and dfs.info.port (Connection Refused Exception)
 2) In hadoop-site.xml, if you put only dfs.info.bindAddress and dfs.info.port 
 but not dfs.http.address (SecondaryNameNode.getServerAddress line 148 throws 
 exception since newAddrPort is null)
 Temporary Solution: If you put dfs.http.address, dfs.info.bindAddress, and 
 dfs.info.port, then SecondaryNameNode successfully fetches the image and log 
 from Namenode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2580) NameNode#main(...) can make use of GenericOptionsParser.

2011-12-29 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2580:
--

Status: Open  (was: Patch Available)

 NameNode#main(...) can make use of GenericOptionsParser.
 

 Key: HDFS-2580
 URL: https://issues.apache.org/jira/browse/HDFS-2580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.24.0

 Attachments: HDFS-2580.patch


 DataNode supports passing generic opts when calling via {{hdfs datanode}}. 
 NameNode can support the same thing as well, but doesn't right now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-61) Datanode shutdown is called multiple times

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-61.
-

Resolution: Cannot Reproduce

On trunk, looks like we only call it once now. This has gone stale, closing out.

 Datanode shutdown is called multiple times 
 ---

 Key: HDFS-61
 URL: https://issues.apache.org/jira/browse/HDFS-61
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 - When DataNode gets {{IncorrectVersionException}} in 
 {{DataNode.offerService()}} {{DataNode.shutdown()}} is called
 - In {{DataNode.processCommand()}} when DataNode gets DNA_SHUTDOWN, 
 {{DataNode.shutdown()}} is called
 {{DataNode.shutdown()}} is again called in {{DataNode.run()}} method

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1910) when dfs.name.dir and dfs.name.edits.dir are same fsimage will be saved twice every time

2011-12-29 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177126#comment-13177126
 ] 

Hudson commented on HDFS-1910:
--

Integrated in Hadoop-Hdfs-22-branch #124 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-22-branch/124/])
Remove erroneously added file while commit to HDFS-1910.
HDFS-1910. NameNdoe should not save fsimage twice. Contributed by Konstantin 
Shvachko.
Revert. Refers to wrong jira HDFS-1910.
HDFS-1910. NameNdoe should not save fsimage twice. Contributed by Konstantin 
Shvachko.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225342
Files : 
* /hadoop/common/branches/branch-0.22/hdfs/bin/hadoop

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225337
Files : 
* /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
* /hadoop/common/branches/branch-0.22/hdfs/bin/hadoop
* 
/hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225336
Files : 
* /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225333
Files : 
* /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java


 when dfs.name.dir and dfs.name.edits.dir are same fsimage will be saved twice 
 every time
 

 Key: HDFS-1910
 URL: https://issues.apache.org/jira/browse/HDFS-1910
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Gokul
Priority: Minor
  Labels: critical-0.22.0
 Fix For: 0.22.1

 Attachments: saveImageOnce-v0.22.patch


 when image and edits dir are configured same, the fsimage flushing from 
 memory to disk will be done twice whenever saveNamespace is done. this may 
 impact the performance of backupnode/snn where it does a saveNamespace during 
 every checkpointing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2728:
--

Status: Patch Available  (was: Open)

 Remove dfsadmin -printTopology from branch-1 docs since it does not exist
 -

 Key: HDFS-2728
 URL: https://issues.apache.org/jira/browse/HDFS-2728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-2728.patch


 It is documented we have -printTopology but we do not really have it in this 
 branch. Possible docs mixup from somewhere in security branch pre-merge?
 {code}
 ➜  branch-1  grep printTopology -R .
 ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
   code-printTopology/code
 ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
 code-printTopology/code
 {code}
 Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2729:
--

   Resolution: Fixed
Fix Version/s: 0.24.0
   Status: Resolved  (was: Patch Available)

Committed revision 1225591. Thanks Eli!

 Update BlockManager's comments regarding the invalid block set
 --

 Key: HDFS-2729
 URL: https://issues.apache.org/jira/browse/HDFS-2729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.24.0

 Attachments: HDFS-2729.patch


 Looks like after HDFS-82 was covered at some point, the comments and logs 
 still carry presence of two sets when there really is just one set.
 This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-57) A Datanode's datadir could have lots of blocks in the top-level directory

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-57.
-

Resolution: Not A Problem

Not a problem in the current {{FSDataset}} operations (neither on branch-1).

 A Datanode's datadir could have lots of blocks in the top-level directory
 -

 Key: HDFS-57
 URL: https://issues.apache.org/jira/browse/HDFS-57
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: dhruba borthakur

 When a datanode restarts, it moves all the blocks from the datadir's tmp 
 directory into the top-level of the datadir. It does not move these blocks 
 into subdirectories of the datadir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-82) recentInvalidateSets in FSNamesystem is not required

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-82.
-

Resolution: Not A Problem

This has been resolved on trunk. We only have one set.

 recentInvalidateSets in FSNamesystem is not required 
 -

 Key: HDFS-82
 URL: https://issues.apache.org/jira/browse/HDFS-82
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Raghu Angadi

 See HADOOP-2576 for more background. 
 When a file is deleted, blocks are first placed in recentInvalidateSets and 
 then later computeDatanodeWork moves it to 'invalidateSet' for each datanode. 
 I could not see why a block is placed in this intermediate set. I think it is 
 confusing as well.. for example, -metasave prints blocks from only one list. 
 Unless we read very carefully its not easy to figure out that there are two 
 lists. My proposal is to keep only one of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2728:
--

Status: Open  (was: Patch Available)

 Remove dfsadmin -printTopology from branch-1 docs since it does not exist
 -

 Key: HDFS-2728
 URL: https://issues.apache.org/jira/browse/HDFS-2728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-2728.patch


 It is documented we have -printTopology but we do not really have it in this 
 branch. Possible docs mixup from somewhere in security branch pre-merge?
 {code}
 ➜  branch-1  grep printTopology -R .
 ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
   code-printTopology/code
 ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
 code-printTopology/code
 {code}
 Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HDFS-2654:
-

 Description: The BlockReaderLocal code paths are easier to understand 
(especially true on branch-1 where BlockReaderLocal inherits code from 
BlockerReader and FSInputChecker) if the local and remote block reader 
implementations are independent, and they're not really sharing much code 
anyway. If for some reason they start to share significant code we can make the 
BlockReader interface an abstract class.  (was: The BlockReaderLocal code paths 
are easier to understand (especially true on branch-1 where BlockReaderLocal 
inherits code from BlockerReader and FSInputChecker) if the local and remote 
block reader implementations are independent, and they're not really sharing 
much code anyway. If for some reason they start to share sifnificant code we 
can make the BlockReader interface an abstract class.)
Target Version/s: 0.23.1, 1.1.0  (was: 1.1.0, 0.23.1)

 Make BlockReaderLocal not extend RemoteBlockReader2
 ---

 Key: HDFS-2654
 URL: https://issues.apache.org/jira/browse/HDFS-2654
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.23.1, 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-2654-1.patch, hdfs-2654-2.patch, hdfs-2654-2.patch, 
 hdfs-2654-2.patch, hdfs-2654-3.patch, hdfs-2654-b1-1.patch, 
 hdfs-2654-b1-2.patch, hdfs-2654-b1-3.patch, hdfs-2654-b1-4-fix.patch, 
 hdfs-2654-b1-4.patch


 The BlockReaderLocal code paths are easier to understand (especially true on 
 branch-1 where BlockReaderLocal inherits code from BlockerReader and 
 FSInputChecker) if the local and remote block reader implementations are 
 independent, and they're not really sharing much code anyway. If for some 
 reason they start to share significant code we can make the BlockReader 
 interface an abstract class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2698) BackupNode is downloading image from NameNode for every checkpoint

2011-12-29 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177125#comment-13177125
 ] 

Hudson commented on HDFS-2698:
--

Integrated in Hadoop-Hdfs-22-branch #124 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-22-branch/124/])
HDFS-2698. BackupNode is downloading image from NameNode for every 
checkpoint. Contributed by Konstantin Shvachko.

shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225340
Files : 
* /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java


 BackupNode is downloading image from NameNode for every checkpoint
 --

 Key: HDFS-2698
 URL: https://issues.apache.org/jira/browse/HDFS-2698
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 0.22.1

 Attachments: rollFSImage.patch, rollFSImage.patch


 BackupNode can make periodic checkpoints without downloading image and edits 
 files from the NameNode, but with just saving the namespace to local disks. 
 This is not happening because NN renews checkpoint time after every 
 checkpoint, thus making its image ahead of the BN's even though they are in 
 sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Harsh J (Created) (JIRA)
Update BlockManager's comments regarding the invalid block set
--

 Key: HDFS-2729
 URL: https://issues.apache.org/jira/browse/HDFS-2729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


Looks like after HDFS-82 was covered at some point, the comments and logs still 
carry presence of two sets when there really is just one set.

This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-19) Unhandled exceptions in DFSClient

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-19.
-

Resolution: Invalid

This has gone stale. I do not find these methods in the current 
DFSOutputStream. Do open a new one if there is still trouble with the newer 
impl.

 Unhandled exceptions in DFSClient
 -

 Key: HDFS-19
 URL: https://issues.apache.org/jira/browse/HDFS-19
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Konstantin Shvachko

 DFSOutputStream.handleSocketException() does not handle exceptions thrown 
 inside it
 by abandonBlock(). I'd propose to retry abandonBlock() in case of timeout.
 In case of DFSOutputStream.close() the exception in handleSocketException() 
 will result in
 calling abandonFileInProgress().
 In a similar case of DFSOutputStream.flush() the file will not be abandoned.
 Exceptions thrown by abandonFileInProgress() are not handled either.
 Feels like we need a general mechanism for handling all these things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2726) Exception in createBlockOutputStream shouldn't delete exception stack trace

2011-12-29 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177141#comment-13177141
 ] 

Hudson commented on HDFS-2726:
--

Integrated in Hadoop-Hdfs-trunk #909 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/909/])
HDFS-2726. Fix a logging issue under DFSClient's createBlockOutputStream 
method (harsh)

harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225456
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java


 Exception in createBlockOutputStream shouldn't delete exception stack trace
 -

 Key: HDFS-2726
 URL: https://issues.apache.org/jira/browse/HDFS-2726
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Michael Bieniosek
Assignee: Harsh J
 Fix For: 0.24.0

 Attachments: HDFS-2726.patch


 I'm occasionally (1/5000 times) getting this error after upgrading everything 
 to hadoop-0.18:
 08/09/09 03:28:36 INFO dfs.DFSClient: Exception in createBlockOutputStream 
 java.io.IOException: Could not read from stream
 08/09/09 03:28:36 INFO dfs.DFSClient: Abandoning block 
 blk_624229997631234952_8205908
 DFSClient contains the logging code:
 LOG.info(Exception in createBlockOutputStream  + ie);
 This would be better written with ie as the second argument to LOG.info, so 
 that the stack trace could be preserved.  As it is, I don't know how to start 
 debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-22) Help information of refreshNodes does not show how to decomission nodes

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-22.
-

Resolution: Not A Problem

The current docs:

{code}
Updates the set of hosts allowed to connect to namenode. Re-reads the config 
file to update values defined by dfs.hosts and dfs.host.exclude and reads the 
entires (hostnames) in those files. Each entry not defined in dfs.hosts but in 
dfs.hosts.exclude is decommissioned. Each entry defined in dfs.hosts and also 
in dfs.host.exclude is stopped from decommissioning if it has aleady been 
marked for decommission. Entires not present in both the lists are 
decommissioned.
{code}

Covers it pretty much I think? Please reopen if not.

 Help information of refreshNodes does not show how to decomission nodes
 ---

 Key: HDFS-22
 URL: https://issues.apache.org/jira/browse/HDFS-22
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: hadoop 0.19.1, jdk 1.6, CentOS 5.2
Reporter: Wang Xu
Assignee: Wang Xu
 Attachments: refreshNodes.patch


 The help information does not indicate how to decommission nodes.
 It only describes two scenarios:
 * to stop nodes if not in dfs.hosts
 * stop decommissioning if node is decommissioning and in both dfs.hosts and 
 dfs.host.exclude
 but omits this one:
 * starting decommissioning if node is in service and  in both dfs.hosts and 
 dfs.host.exclude
 It would better describe as Each entry defined in dfs.hosts and also
 in dfs.host.exclude is start decommissioning and start block replication
 if it is in service, or is stopped from decommissioning if it has already
 been marked for decommission.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2728:
--

Attachment: HDFS-2728.patch

 Remove dfsadmin -printTopology from branch-1 docs since it does not exist
 -

 Key: HDFS-2728
 URL: https://issues.apache.org/jira/browse/HDFS-2728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-2728.patch


 It is documented we have -printTopology but we do not really have it in this 
 branch. Possible docs mixup from somewhere in security branch pre-merge?
 {code}
 ➜  branch-1  grep printTopology -R .
 ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
   code-printTopology/code
 ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
 code-printTopology/code
 {code}
 Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-102) high cpu usage in ReplicationMonitor thread

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-102.
--

Resolution: Cannot Reproduce

This has gone stale. The current structure within BlockManager isn't a list 
anymore, and we haven't seen this kinda behavior in quite a while.

 high cpu usage in ReplicationMonitor thread 
 

 Key: HDFS-102
 URL: https://issues.apache.org/jira/browse/HDFS-102
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Koji Noguchi

 We had a namenode stuck in CPU 99% and it  was showing a slow response time.
 (dfs.namenode.handler.count was still set to 10.)
 ReplicationMonitor thread was using the most CPU time.
 Jstack showed,
 org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@1c7b0f4d daemon 
 prio=10 tid=0x002d90690800 nid=0x4855 runnable 
 [0x41941000..0x41941b30]
java.lang.Thread.State: RUNNABLE
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at 
 org.apache.hadoop.dfs.FSNamesystem.blocksToInvalidate(FSNamesystem.java:2475)
   - locked 0x002a9f522038 (a org.apache.hadoop.dfs.FSNamesystem)
   at 
 org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1775)
   at 
 org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1713)
   at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-20) fsck path -delete doesn't report failures

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-20.
-

Resolution: Not A Problem

Currently in NamenodeFsck, if any operation under check() throws an exception, 
I can verify it is definitely logged.

Not a problem anymore.

 fsck path -delete doesn't report failures
 ---

 Key: HDFS-20
 URL: https://issues.apache.org/jira/browse/HDFS-20
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Sameer Paranjpye

 When I have safemode on and do fsck / -delete, it legitimately fails on the 
 first delete. However, the fsck stops and does not report the failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-146) Regression: TestInjectionForSimulatedStorage fails with IllegalMonitorStateException

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-146.
--

Resolution: Cannot Reproduce

Hasn't had a failure report in two years now. Gone stale, closing out this and 
related issues.

 Regression: TestInjectionForSimulatedStorage fails with 
 IllegalMonitorStateException
 

 Key: HDFS-146
 URL: https://issues.apache.org/jira/browse/HDFS-146
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: gary murry

 org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage.testInjection fails 
 with IllegalMonitorStateException
 Stacktrace
 java.lang.IllegalMonitorStateException
   at java.lang.Object.notifyAll(Native Method)
   at org.apache.hadoop.ipc.Server.stop(Server.java:1110)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:574)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:569)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:553)
   at 
 org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage.testInjection(TestInjectionForSimulatedStorage.java:195)
 No errors show up in the standard output, but there are a few warnings.
 http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/749/testReport/org.apache.hadoop.hdfs/TestInjectionForSimulatedStorage/testInjection/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-92) if hadoop.tmp.dir is under your dfs.data.dir, HDFS will silently wipe out your name directory

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-92.
-

Resolution: Not A Problem

This goes against all recommendations in configuring the directories. I don't 
see why one would configure it this way that it'd lead to an obvious issue. 
Same is with merging mapred.local.dir and dfs.datanode.data.dir. Resolving as 
not a problem.

 if hadoop.tmp.dir is under your dfs.data.dir, HDFS will silently wipe out 
 your name directory
 ---

 Key: HDFS-92
 URL: https://issues.apache.org/jira/browse/HDFS-92
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: gentoo linux on Intel/Dell w/ Sun JDK
Reporter: Brian Karlak

 I used a hadoop-site.xml conf file like:
   property
 namedfs.data.dir/name
 value/data01/hadoop/value
 descriptionDirs to store data on./description
   /property
   property
 namehadoop.tmp.dir/name
 value/data01/hadoop/tmp/value
 descriptionA base for other temporary directories./description
   /property
 This file will format the namenode properly.  Upon startup with the 
 bin/start-dfs.sh script, however, the /data01/hadoop/tmp/dfs/name directory 
 is silently wiped out.  This foobars the namenode, but only after the next 
 DFS stop/start cycle.  (see output below)
 This is obviously a configuration error first and foremost, but the fact that 
 hadoop silently corrupts itself makes it tricky to track down.
 [hid191]$ bin/hadoop namenode -format
 08/04/04 18:41:43 INFO dfs.NameNode: STARTUP_MSG: 
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = hid191.dev01.corp.metaweb.com/127.0.0.1
 STARTUP_MSG:   args = [-format]
 STARTUP_MSG:   version = 0.16.2
 STARTUP_MSG:   build = 
 http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 642481; 
 compiled by 'hadoopqa' on Sat Mar 29 01:59:04 UTC 2008
 /
 08/04/04 18:41:43 INFO fs.FSNamesystem: fsOwner=zenkat,users
 08/04/04 18:41:43 INFO fs.FSNamesystem: supergroup=supergroup
 08/04/04 18:41:43 INFO fs.FSNamesystem: isPermissionEnabled=true
 08/04/04 18:41:43 INFO dfs.Storage: Storage directory 
 /data01/hadoop/tmp/dfs/name has been successfully formatted.
 08/04/04 18:41:43 INFO dfs.NameNode: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down NameNode at 
 hid191.dev01.corp.metaweb.com/127.0.0.1
 /
 [hid191]$ ls /data01/hadoop/tmp/dfs/name
 current  image
 [hid191]$ bin/start-dfs.sh 
 starting namenode, logging to 
 /data01/hadoop/logs/hadoop-zenkat-namenode-hid191.out
 localhost: starting datanode, logging to 
 /data01/hadoop/logs/hadoop-zenkat-datanode-hid191.out
 localhost: starting secondarynamenode, logging to 
 /data01/hadoop/logs/hadoop-zenkat-secondarynamenode-hid191.out
 [hid191]$ ls /data01/hadoop/tmp/dfs/name
 ls: cannot access /data01/hadoop/tmp/dfs/name: No such file or directory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177164#comment-13177164
 ] 

Hadoop QA commented on HDFS-2728:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508838/HDFS-2728.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1745//console

This message is automatically generated.

 Remove dfsadmin -printTopology from branch-1 docs since it does not exist
 -

 Key: HDFS-2728
 URL: https://issues.apache.org/jira/browse/HDFS-2728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-2728.patch


 It is documented we have -printTopology but we do not really have it in this 
 branch. Possible docs mixup from somewhere in security branch pre-merge?
 {code}
 ➜  branch-1  grep printTopology -R .
 ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
   code-printTopology/code
 ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
 code-printTopology/code
 {code}
 Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-64) delete on dfs hung

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-64.
-

Resolution: Not A Problem

This has gone stale, and given that we haven't seen this recently at all, looks 
like it may have been fixed inadvertently.

 delete on dfs hung
 --

 Key: HDFS-64
 URL: https://issues.apache.org/jira/browse/HDFS-64
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Devaraj Das

 I had a case where the JobTracker was trying to delete some files, as part of 
 Garbage Collect for a job, in a dfs directory. The thread hung and this is 
 the trace:
 Thread 19 (IPC Server handler 5 on 57344):
   State: WAITING
   Blocked count: 137022
   Waited count: 336004
   Waiting on org.apache.hadoop.ipc.Client$Call@eb6238
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 org.apache.hadoop.ipc.Client.call(Client.java:683)
 org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
 org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
 sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 java.lang.reflect.Method.invoke(Method.java:597)
 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
 org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:515)
 
 org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:170)
 org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:118)
 org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:114)
 
 org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1635)
 
 org.apache.hadoop.mapred.JobInProgress.isJobComplete(JobInProgress.java:1387)
 
 org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:1348)
 
 org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:565)
 
 org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:2032)
 and it hung for an enormously long amount of time ~1 hour. 
 Not sure whether these will help:
 I saw this message in the NameNode log around the time the delete was issued 
 by the JobTracker
 2008-05-07 09:55:57,375 WARN org.apache.hadoop.dfs.StateChange: DIR* 
 FSDirectory.unprotectedDelete: failed to remove 
 /mapredsystem/ddas/mapredsystem/10091.{running.machine.com}/job_200805070458_0004
  because it does not exist
 I also checked that the directory in question was actually there (and the job 
 couldn't have run without this directory being there).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-12) hadoop dfs -put does not return nonzero status on failure

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-12.
-

   Resolution: Not A Problem
Fix Version/s: 0.23.0

This has been fixed by the FsCommand revamp on 0.23+.

 hadoop dfs -put does not return nonzero status on failure
 ---

 Key: HDFS-12
 URL: https://issues.apache.org/jira/browse/HDFS-12
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Karl Anderson
 Fix For: 0.23.0


 I'm attempting to put a file on DFS with the hadoop dfs -put command.  The 
 put is failing, probably because my cluster is still being initialized, but 
 the command is still returning a status of 0.  
 If there was a meaningful error status, I'd be able to handle the situation 
 (in my case, waiting and putting again works).
 The output is telling me there is a NotReplicatedYetException; it's a new 
 cluster and the nodes are still being initialized.
 Here's the beginning of the output; it tries a few times, but eventually 
 gives up.
 executing: source ~/.bash_profile; hadoop dfs -put ./vectorfile 
 input/vectorfile
 08/08/21 13:06:00 WARN fs.FileSystem: ip-10-251-195-162.ec2.internal:50001 
 is a deprecated filesystem name. Use 
 hdfs://ip-10-251-195-162.ec2.internal:50001/ instead.
 08/08/21 13:06:00 WARN fs.FileSystem: ip-10-251-195-162.ec2.internal:50001 
 is a deprecated filesystem name. Use 
 hdfs://ip-10-251-195-162.ec2.internal:50001/ instead.
 08/08/21 13:06:00 WARN fs.FileSystem: ip-10-251-195-162.ec2.internal:50001 
 is a deprecated filesystem name. Use 
 hdfs://ip-10-251-195-162.ec2.internal:50001/ instead.
 08/08/21 13:06:00 WARN fs.FileSystem: ip-10-251-195-162.ec2.internal:50001 
 is a deprecated filesystem name. Use 
 hdfs://ip-10-251-195-162.ec2.internal:50001/ instead.
 08/08/21 13:06:01 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: 
 java.io.IOException: File /user/root/input/vectorfile could only be 
 replicated to 0 nodes, instead of 1
   at 
 org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1117)
   at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
   at org.apache.hadoop.ipc.Client.call(Client.java:715)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
   at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
   at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2440)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2323)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)
 08/08/21 13:06:01 WARN dfs.DFSClient: NotReplicatedYetException sleeping 
 /user/root/input/vectorfile retries left 4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177250#comment-13177250
 ] 

Eli Collins commented on HDFS-2729:
---

+1  findbugs and test failure are unrelated.

 Update BlockManager's comments regarding the invalid block set
 --

 Key: HDFS-2729
 URL: https://issues.apache.org/jira/browse/HDFS-2729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-2729.patch


 Looks like after HDFS-82 was covered at some point, the comments and logs 
 still carry presence of two sets when there really is just one set.
 This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2728.
---

   Resolution: Fixed
Fix Version/s: 1.1.0

Committed revision 1225589. Thanks Eli!

 Remove dfsadmin -printTopology from branch-1 docs since it does not exist
 -

 Key: HDFS-2728
 URL: https://issues.apache.org/jira/browse/HDFS-2728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 1.1.0

 Attachments: HDFS-2728.patch


 It is documented we have -printTopology but we do not really have it in this 
 branch. Possible docs mixup from somewhere in security branch pre-merge?
 {code}
 ➜  branch-1  grep printTopology -R .
 ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
   code-printTopology/code
 ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
 code-printTopology/code
 {code}
 Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-59) No recovery when trying to replicate on marginal datanode

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-59.
-

Resolution: Not A Problem

This has gone stale. We haven't seen this lately. Lets file a new one if we see 
this again (these days it errors out with 'Could only replicate to X nodes' 
kinda errors).

Also, could've been your dfs.replication.min  1.

 No recovery when trying to replicate on marginal datanode
 -

 Key: HDFS-59
 URL: https://issues.apache.org/jira/browse/HDFS-59
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Sep 14 nightly build with a couple of mapred-related 
 patches
Reporter: Christian Kunz

 We have been uploading a lot of data to hdfs, running about 400 scripts in 
 parallel calling hadoop's command line utility in distributed fashion. Many 
 of them started to hang when copying large files (120GB), repeating the 
 following messages without end:
 07/10/05 15:44:25 INFO fs.DFSClient: Could not complete file, retrying...
 07/10/05 15:44:26 INFO fs.DFSClient: Could not complete file, retrying...
 07/10/05 15:44:26 INFO fs.DFSClient: Could not complete file, retrying...
 07/10/05 15:44:27 INFO fs.DFSClient: Could not complete file, retrying...
 07/10/05 15:44:27 INFO fs.DFSClient: Could not complete file, retrying...
 07/10/05 15:44:28 INFO fs.DFSClient: Could not complete file, retrying...
 In the namenode log I eventually found repeated messages like:
 2007-10-05 14:40:08,063 WARN org.apache.hadoop.fs.FSNamesystem: 
 PendingReplicationMonitor timed out block blk_3124504920241431462
 2007-10-05 14:40:11,876 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
 NameSystem.pendingTransfer: ask IP450010 to replicate 
 blk_3124504920241431462 to datanode(s) IP4_1:50010
 2007-10-05 14:45:08,069 WARN org.apache.hadoop.fs.FSNamesystem: 
 PendingReplicationMonitor timed out block blk_8533614499490422104
 2007-10-05 14:45:08,070 WARN org.apache.hadoop.fs.FSNamesystem: 
 PendingReplicationMonitor timed out block blk_7741954594593177224
 2007-10-05 14:45:13,973 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
 NameSystem.pendingTransfer: ask IP4:50010 to replicate 
 blk_7741954594593177224 to datanode(s) IP4_2:50010
 2007-10-05 14:45:13,973 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
 NameSystem.pendingTransfer: ask IP4:50010 to replicate 
 blk_8533614499490422104 to datanode(s) IP4_350010
 I could not ssh to the  node with IpAdress IP4, but seemingly the datanode 
 server still sent heartbeats. After rebooting the node it  was okay again and 
 a few files and a few clients recovered, but not all.
 I restarted these clients and they completed this time (before noticing the 
 marginal node we restarted the clients twice without success).
 I would conclude that the existence of the marginal node must have caused 
 loss of blocks, at least in the tracking mechanism, in addition to eternal 
 retries.
 In summary, dfs should be able to handle datanodes with good heartbeat but 
 otherwise failing to do their job. This should include datanodes that have a 
 high rate of socket connection timeouts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-104) TestInjectionForSimulatedStorage fails once in a while

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-104.
--

Resolution: Cannot Reproduce

Hasn't had a failure report in two years now. Gone stale, closing out this and 
related issues.

 TestInjectionForSimulatedStorage fails once in a while
 --

 Key: HDFS-104
 URL: https://issues.apache.org/jira/browse/HDFS-104
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lohit Vijayarenu

 TestInjectionForSimulatedStorage fails once in a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-1314:
--

Target Version/s: 0.24.0
  Status: Patch Available  (was: Open)

+1. Will commit once Hudson reports its build.

 dfs.block.size accepts only absolute value
 --

 Key: HDFS-1314
 URL: https://issues.apache.org/jira/browse/HDFS-1314
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Karim Saadah
Assignee: Sho Shimauchi
Priority: Minor
  Labels: newbie
 Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt


 Using dfs.block.size=8388608 works 
 but dfs.block.size=8mb does not.
 Using dfs.block.size=8mb should throw some WARNING on NumberFormatException.
 (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2728:
--

Target Version/s: 1.1.0  (was: 0.24.0)

 Remove dfsadmin -printTopology from branch-1 docs since it does not exist
 -

 Key: HDFS-2728
 URL: https://issues.apache.org/jira/browse/HDFS-2728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-2728.patch


 It is documented we have -printTopology but we do not really have it in this 
 branch. Possible docs mixup from somewhere in security branch pre-merge?
 {code}
 ➜  branch-1  grep printTopology -R .
 ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
   code-printTopology/code
 ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
 code-printTopology/code
 {code}
 Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-105) Streaming task stuck in MapTask$DirectMapOutputCollector.close

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-105.
--

Resolution: Cannot Reproduce

Hasn't had a similar failure report in two years now. Gone stale, so closing 
out as can't reproduce. Lets open a new one should we face this again (looks 
transient?)

 Streaming task stuck in MapTask$DirectMapOutputCollector.close
 --

 Key: HDFS-105
 URL: https://issues.apache.org/jira/browse/HDFS-105
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Amareshwari Sriramadasu
 Attachments: thread_dump.txt


 Observed a streaming task stuck in MapTask$DirectMapOutputCollector.close

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-55) Change all references of dfs to hdfs in configs

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-55.
-

Resolution: Won't Fix

Its all dfs.* and its in Hadoop, and the settings go to hdfs-site.xml. I think 
that is sufficient?

Don't think its worth the change. Feel free to reopen if you feel otherwise 
strongly.

 Change all references of dfs to hdfs in configs
 ---

 Key: HDFS-55
 URL: https://issues.apache.org/jira/browse/HDFS-55
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lohit Vijayarenu

 After code restructuring dfs has been changed to hdfs, but I see config 
 variables with dfs.something eg dfs.http.address. Should we change 
 everything to hdfs?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2729:
--

Attachment: HDFS-2729.patch

 Update BlockManager's comments regarding the invalid block set
 --

 Key: HDFS-2729
 URL: https://issues.apache.org/jira/browse/HDFS-2729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-2729.patch


 Looks like after HDFS-82 was covered at some point, the comments and logs 
 still carry presence of two sets when there really is just one set.
 This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-103) handle return value of globStatus() to be uniform.

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-103.
--

Resolution: Not A Problem

Looking at the current impl. of globStatus, we always return an empty 
FileStatus[] out, never a null.

Not a problem anymore.

 handle return value of globStatus() to be uniform.
 --

 Key: HDFS-103
 URL: https://issues.apache.org/jira/browse/HDFS-103
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lohit Vijayarenu

 Some places in code does not expect null value from globStatus(Path path), 
 they expect path. These have to be fixed to handle null to be uniform.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-44) Unit test failed: TestInjectionForSimulatedStorage

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-44.
-

Resolution: Cannot Reproduce

Hasn't had a failure report in two years now. Gone stale, closing out this and 
related issues.

 Unit test failed: TestInjectionForSimulatedStorage
 --

 Key: HDFS-44
 URL: https://issues.apache.org/jira/browse/HDFS-44
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Mukund Madhugiri

 Unit test failed: TestInjectionForSimulatedStorage failed in the nightly 
 build with a timeout:
 tail from the console:
 [junit] 2007-12-12 12:02:18,674 INFO  dfs.TestInjectionForSimulatedStorage 
 (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not 
 enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5.
 [junit] 2007-12-12 12:02:19,184 INFO  
 dfs.TestInjectionForSimulatedStorage 
 (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not 
 enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5.
 [junit] 2007-12-12 12:02:19,694 INFO  
 dfs.TestInjectionForSimulatedStorage 
 (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not 
 enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5.
 [junit] 2007-12-12 12:02:20,204 INFO  
 dfs.TestInjectionForSimulatedStorage 
 (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not 
 enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5.
 [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
 [junit] Test org.apache.hadoop.dfs.TestInjectionForSimulatedStorage 
 FAILED (timeout)
 Complete console log:
 http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/330/console

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Harsh J (Created) (JIRA)
Remove dfsadmin -printTopology from branch-1 docs since it does not exist
-

 Key: HDFS-2728
 URL: https://issues.apache.org/jira/browse/HDFS-2728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


It is documented we have -printTopology but we do not really have it in this 
branch. Possible docs mixup from somewhere in security branch pre-merge?

{code}
➜  branch-1  grep printTopology -R .
./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
  code-printTopology/code
./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
code-printTopology/code
{code}

Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-35) Confusing set replication message

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-35.
-

Resolution: Incomplete

Unsure what seems to be the problem here. Those logs are in if-else clauses and 
represent the up vs. down just fine, reading FSNamesystem code presently.

 Confusing set replication message
 -

 Key: HDFS-35
 URL: https://issues.apache.org/jira/browse/HDFS-35
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Raghu Angadi
Priority: Minor

 If a file has a replicaiton of 3 and setReplication() is used to set the 
 replication to 1 we will see following log in NameNode log : 
 {noformat}
 2007-08-07 12:18:27,370 INFO  fs.FSNamesystem 
 (FSNamesystem.java:setReplicationInternal(661)) - Increasing replication for 
 file /srcdat/2725423627829963655. New replication is 1
 2007-08-07 12:18:27,370 INFO  fs.FSNamesystem 
 (FSNamesystem.java:setReplicationInternal(668)) - Reducing replication for 
 file /srcdat/2725423627829963655. New replication is 1
 {noformat}
 Fixing this could be trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177211#comment-13177211
 ] 

Hadoop QA commented on HDFS-2729:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508841/HDFS-2729.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated 20 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

-1 release audit.  The applied patch generated 1 release audit warnings 
(more than the trunk's current 0 warnings).

-1 core tests.  The patch failed these unit tests:
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1746//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1746//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1746//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1746//console

This message is automatically generated.

 Update BlockManager's comments regarding the invalid block set
 --

 Key: HDFS-2729
 URL: https://issues.apache.org/jira/browse/HDFS-2729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-2729.patch


 Looks like after HDFS-82 was covered at some point, the comments and logs 
 still carry presence of two sets when there really is just one set.
 This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177110#comment-13177110
 ] 

Hadoop QA commented on HDFS-1314:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508830/hdfs-1314.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1744//console

This message is automatically generated.

 dfs.block.size accepts only absolute value
 --

 Key: HDFS-1314
 URL: https://issues.apache.org/jira/browse/HDFS-1314
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Karim Saadah
Assignee: Sho Shimauchi
Priority: Minor
  Labels: newbie
 Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt


 Using dfs.block.size=8388608 works 
 but dfs.block.size=8mb does not.
 Using dfs.block.size=8mb should throw some WARNING on NumberFormatException.
 (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist

2011-12-29 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177251#comment-13177251
 ] 

Eli Collins commented on HDFS-2728:
---

+1   Don't think test-patch results on branch-1 are needed as the change is 
trivial.

 Remove dfsadmin -printTopology from branch-1 docs since it does not exist
 -

 Key: HDFS-2728
 URL: https://issues.apache.org/jira/browse/HDFS-2728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-2728.patch


 It is documented we have -printTopology but we do not really have it in this 
 branch. Possible docs mixup from somewhere in security branch pre-merge?
 {code}
 ➜  branch-1  grep printTopology -R .
 ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base:
   code-printTopology/code
 ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml:  
 code-printTopology/code
 {code}
 Lets remove the reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2580) NameNode#main(...) can make use of GenericOptionsParser.

2011-12-29 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2580:
--

Status: Patch Available  (was: Open)

Resubmitting for tests.

I don't see an elegant way to use Tool interface, given the createNamenode(…) 
static call required to initialize 'this'. This should suffice.

 NameNode#main(...) can make use of GenericOptionsParser.
 

 Key: HDFS-2580
 URL: https://issues.apache.org/jira/browse/HDFS-2580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.24.0

 Attachments: HDFS-2580.patch


 DataNode supports passing generic opts when calling via {{hdfs datanode}}. 
 NameNode can support the same thing as well, but doesn't right now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2729) Update BlockManager's comments regarding the invalid block set

2011-12-29 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2729:
--

Status: Patch Available  (was: Open)

Trivial patch that changes comments and log statements. No tests required.

 Update BlockManager's comments regarding the invalid block set
 --

 Key: HDFS-2729
 URL: https://issues.apache.org/jira/browse/HDFS-2729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-2729.patch


 Looks like after HDFS-82 was covered at some point, the comments and logs 
 still carry presence of two sets when there really is just one set.
 This patch changes the logs and comments to be more accurate about that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-47) dead datanodes because of OutOfMemoryError

2011-12-29 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-47.
-

Resolution: Not A Problem

This has gone stale. FWIW, haven't seen DNs go OOM on its own in recent years. 
Probably a leak that was fixed?

Resolving as Not a Problem (anymore).

 dead datanodes because of OutOfMemoryError
 --

 Key: HDFS-47
 URL: https://issues.apache.org/jira/browse/HDFS-47
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Christian Kunz

 We see more dead datanodes than in previous releases. The common exception is 
 found in the out file:
 Exception in thread org.apache.hadoop.dfs.DataBlockScanner@18166e5 
 java.lang.OutOfMemoryError: Java heap space
 Exception in thread DataNode: [dfs.data.dir-value] 
 java.lang.OutOfMemoryError: Java heap space

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2692) HA: Bugs related to failover from/into safe-mode

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177367#comment-13177367
 ] 

Todd Lipcon commented on HDFS-2692:
---

bq. In FSEditLogLoader#loadFSEdits, should we really be unconditionally calling 
FSNamesystem#notifyGenStampUpdate in the finally block? What if an error occurs 
and maxGenStamp is never updated in FSEditLogLoader#loadEditRecords

This should be OK -- we'll just call it with the argument 0, which won't cause 
any problem (0 is lower than any possible queued gen stamp)

bq. sp. Initiatling in TestHASafeMode#testComplexFailoverIntoSafemode
fixed

bq. In FSNamesystem#notifyGenStampUpdate, could be a better log message, and 
the log level should probably not be info: LOG.info(= notified of genstamp 
update for:  + gs);
Fixed and changed to DEBUG level

bq. Why is SafeModeInfo#doConsistencyCheck costly? It doesn't seem like it 
should be. If it's not in fact expensive, we might as well make it run 
regardless of whether or not asserts are enabled
You're right that it's not super expensive, but this code gets called on every 
block being reported during startup, which is a fair amount.. so I chose to 
maintain the current behavior, of only running the checks when asserts are 
enabled.

bq. Is there really no better way to check if assertions are enabled?
Not that I've ever found! :(

bq. seems like they should all be made member methods and moved to 
MiniDFSCluster... Also seems like TestEditLogTailer#waitForStandbyToCatchUp 
should be moved to MiniDFSCluster.
I'd like to move a bunch of these methods into a new {{HATestUtil}} class... 
can I do that in a follow-up JIRA?

Eli said:
bq. Nice change and tests. Nit, I'd add a comment in 
TestHASafeMode#restartStandby where the safemode extension is set indicating 
the rationale, it looked like the asserts at the end were racy because I missed 
this
Fixed

 HA: Bugs related to failover from/into safe-mode
 

 Key: HDFS-2692
 URL: https://issues.apache.org/jira/browse/HDFS-2692
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Attachments: hdfs-2692.txt, hdfs-2692.txt


 In testing I saw an AssertionError come up several times when I was trying to 
 do failover between two NNs where one or the other was in safe-mode. Need to 
 write some unit tests to try to trigger this -- hunch is it has something to 
 do with the treatment of safe block count while tailing edits in safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2692) HA: Bugs related to failover from/into safe-mode

2011-12-29 Thread Todd Lipcon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2692:
--

Attachment: hdfs-2692.txt

 HA: Bugs related to failover from/into safe-mode
 

 Key: HDFS-2692
 URL: https://issues.apache.org/jira/browse/HDFS-2692
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Attachments: hdfs-2692.txt, hdfs-2692.txt, hdfs-2692.txt


 In testing I saw an AssertionError come up several times when I was trying to 
 do failover between two NNs where one or the other was in safe-mode. Need to 
 write some unit tests to try to trigger this -- hunch is it has something to 
 do with the treatment of safe block count while tailing edits in safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2692) HA: Bugs related to failover from/into safe-mode

2011-12-29 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177368#comment-13177368
 ] 

Aaron T. Myers commented on HDFS-2692:
--

bq. I'd like to move a bunch of these methods into a new HATestUtil class... 
can I do that in a follow-up JIRA?

Definitely. This also came up in Eli's review of HDFS-2709. Please file?

+1, the latest patch looks good to me.

 HA: Bugs related to failover from/into safe-mode
 

 Key: HDFS-2692
 URL: https://issues.apache.org/jira/browse/HDFS-2692
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Attachments: hdfs-2692.txt, hdfs-2692.txt, hdfs-2692.txt


 In testing I saw an AssertionError come up several times when I was trying to 
 do failover between two NNs where one or the other was in safe-mode. Need to 
 write some unit tests to try to trigger this -- hunch is it has something to 
 do with the treatment of safe block count while tailing edits in safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177371#comment-13177371
 ] 

Todd Lipcon commented on HDFS-2720:
---

Small nits:

{code}
+  // Now format 1st NN and copy the storage dirs to remaining all.
{code}
to remaining all seems like a typo. copy the storage directory from that 
node to the others. would be better. Also I think it's easier to read first 
than 1st


{code}
+  //Start all Namenodes
{code}
add space after {{//}}



- The change to remove setRpcEngine looks unrelated - that should get cleaned 
up in trunk so it doesn't present a merge issue in the branch.

 HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 
 nameSpaceDirs to NN2 nameSpaceDirs 
 

 Key: HDFS-2720
 URL: https://issues.apache.org/jira/browse/HDFS-2720
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-2720.patch


 To maintain the clusterID same , we are copying the namespaceDirs from 1st NN 
 to other NNs.
 While copying this files, in_use.lock file may not allow to copy in all the 
 OSs since it has aquired the lock on it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2714) HA: Fix test cases which use standalone FSNamesystems

2011-12-29 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177374#comment-13177374
 ] 

Aaron T. Myers commented on HDFS-2714:
--

+1, the patch looks good to me.

 HA: Fix test cases which use standalone FSNamesystems
 -

 Key: HDFS-2714
 URL: https://issues.apache.org/jira/browse/HDFS-2714
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Trivial
 Attachments: hdfs-2714.txt


 Several tests (eg TestEditLog, TestSaveNamespace) failed in the most recent 
 build with an NPE inside of FSNamesystem.checkOperation. These tests set up a 
 standalone FSN that isn't fully initialized. We just need to add a null check 
 to deal with this case in checkOperation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs

2011-12-29 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177389#comment-13177389
 ] 

Eli Collins commented on HDFS-2720:
---

ATM and I were discussing how to initialize the SBN state yesterday. What we 
currently do is format the primary then copy the name dirs to the SBN. How 
about making the SBN do this automatically on startup? Specifically, on NN 
startup, if HA and a shared edits dir are configured, if there is no local 
image but the shared-dir is configured then the SBN downloads the image from 
the primary (if the other NN is still standby then it fails to start as it does 
currently).

 HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 
 nameSpaceDirs to NN2 nameSpaceDirs 
 

 Key: HDFS-2720
 URL: https://issues.apache.org/jira/browse/HDFS-2720
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-2720.patch


 To maintain the clusterID same , we are copying the namespaceDirs from 1st NN 
 to other NNs.
 While copying this files, in_use.lock file may not allow to copy in all the 
 OSs since it has aquired the lock on it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177434#comment-13177434
 ] 

Todd Lipcon commented on HDFS-2720:
---

That would be a nice improvement... but I think it makes sense to do this small 
fix that Uma proposed so the tests run on Windows, and then do the standby 
initialize from remote active feature separately?

 HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 
 nameSpaceDirs to NN2 nameSpaceDirs 
 

 Key: HDFS-2720
 URL: https://issues.apache.org/jira/browse/HDFS-2720
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-2720.patch


 To maintain the clusterID same , we are copying the namespaceDirs from 1st NN 
 to other NNs.
 While copying this files, in_use.lock file may not allow to copy in all the 
 OSs since it has aquired the lock on it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Sho Shimauchi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177460#comment-13177460
 ] 

Sho Shimauchi commented on HDFS-1314:
-

I guess HADOOP-7910 was not yet merged into the trunk at that time.
Now it has been merged.
Could you try the same patch again?

 dfs.block.size accepts only absolute value
 --

 Key: HDFS-1314
 URL: https://issues.apache.org/jira/browse/HDFS-1314
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Karim Saadah
Assignee: Sho Shimauchi
Priority: Minor
  Labels: newbie
 Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt


 Using dfs.block.size=8388608 works 
 but dfs.block.size=8mb does not.
 Using dfs.block.size=8mb should throw some WARNING on NumberFormatException.
 (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2714) HA: Fix test cases which use standalone FSNamesystems

2011-12-29 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2714.
---

   Resolution: Fixed
Fix Version/s: HA branch (HDFS-1623)
 Hadoop Flags: Reviewed

 HA: Fix test cases which use standalone FSNamesystems
 -

 Key: HDFS-2714
 URL: https://issues.apache.org/jira/browse/HDFS-2714
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Trivial
 Fix For: HA branch (HDFS-1623)

 Attachments: hdfs-2714.txt


 Several tests (eg TestEditLog, TestSaveNamespace) failed in the most recent 
 build with an NPE inside of FSNamesystem.checkOperation. These tests set up a 
 standalone FSN that isn't fully initialized. We just need to add a null check 
 to deal with this case in checkOperation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2730) HA: Refactor shared HA-related test code into HATestUtils class

2011-12-29 Thread Todd Lipcon (Created) (JIRA)
HA: Refactor shared HA-related test code into HATestUtils class
---

 Key: HDFS-2730
 URL: https://issues.apache.org/jira/browse/HDFS-2730
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: HA branch (HDFS-1623)


A fair number of the HA tests are sharing code like 
{{waitForStandbyToCatchUp}}, etc. We should refactor this code into an 
HATestUtils class with static methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs

2011-12-29 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177468#comment-13177468
 ] 

Eli Collins commented on HDFS-2720:
---

Yup, I'll file a separate jira. Agree wrt the fix for Windows.

 HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 
 nameSpaceDirs to NN2 nameSpaceDirs 
 

 Key: HDFS-2720
 URL: https://issues.apache.org/jira/browse/HDFS-2720
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-2720.patch


 To maintain the clusterID same , we are copying the namespaceDirs from 1st NN 
 to other NNs.
 While copying this files, in_use.lock file may not allow to copy in all the 
 OSs since it has aquired the lock on it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2692) HA: Bugs related to failover from/into safe-mode

2011-12-29 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2692.
---

   Resolution: Fixed
Fix Version/s: HA branch (HDFS-1623)
 Hadoop Flags: Reviewed

Committed to branch, thanks for the reviews, Aaron and Eli. I filed HDFS-2730 
for the test util refactor

 HA: Bugs related to failover from/into safe-mode
 

 Key: HDFS-2692
 URL: https://issues.apache.org/jira/browse/HDFS-2692
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: HA branch (HDFS-1623)

 Attachments: hdfs-2692.txt, hdfs-2692.txt, hdfs-2692.txt


 In testing I saw an AssertionError come up several times when I was trying to 
 do failover between two NNs where one or the other was in safe-mode. Need to 
 write some unit tests to try to trigger this -- hunch is it has something to 
 do with the treatment of safe block count while tailing edits in safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Eli Collins (Created) (JIRA)
Autopopulate standby name dirs if they're empty
---

 Key: HDFS-2731
 URL: https://issues.apache.org/jira/browse/HDFS-2731
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


To setup a SBN we currently format the primary then manually copy the name dirs 
to the SBN. The SBN should do this automatically. Specifically, on NN startup, 
if HA with a shared edits dir is configured and populated, if the SBN has empty 
name dirs it should downloads the image and log from the primary (as an 
optimization it could copy the logs from the shared dir). If the other NN is 
still in standby then it should fails to start as it does currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2731:
--

Description: To setup a SBN we currently format the primary then manually 
copy the name dirs to the SBN. The SBN should do this automatically. 
Specifically, on NN startup, if HA with a shared edits dir is configured and 
populated, if the SBN has empty name dirs it should downloads the image and log 
from the primary (as an optimization it could copy the logs from the shared 
dir). If the other NN is still in standby then it should fail to start as it 
does currently.  (was: To setup a SBN we currently format the primary then 
manually copy the name dirs to the SBN. The SBN should do this automatically. 
Specifically, on NN startup, if HA with a shared edits dir is configured and 
populated, if the SBN has empty name dirs it should downloads the image and log 
from the primary (as an optimization it could copy the logs from the shared 
dir). If the other NN is still in standby then it should fails to start as it 
does currently.)

 Autopopulate standby name dirs if they're empty
 ---

 Key: HDFS-2731
 URL: https://issues.apache.org/jira/browse/HDFS-2731
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins

 To setup a SBN we currently format the primary then manually copy the name 
 dirs to the SBN. The SBN should do this automatically. Specifically, on NN 
 startup, if HA with a shared edits dir is configured and populated, if the 
 SBN has empty name dirs it should downloads the image and log from the 
 primary (as an optimization it could copy the logs from the shared dir). If 
 the other NN is still in standby then it should fail to start as it does 
 currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2732) Add support for the standby in the bin scripts

2011-12-29 Thread Eli Collins (Created) (JIRA)
Add support for the standby in the bin scripts
--

 Key: HDFS-2732
 URL: https://issues.apache.org/jira/browse/HDFS-2732
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


We need to update the bin scripts to support SBNs. Two ideas:

Modify start-dfs.sh to start another copy of the NN if HA is configured. We 
could introduce a file similar to masters (2NN hosts) called standbys which 
lists the SBN hosts, and start-dfs.sh would automatically make the NN it starts 
active (and leave the NNs listed in standby as is).

Or simpler, we could just provide a start-namenode.sh script that a user can 
run to start the SBN on another host themselves. The user would manually tell 
the other NN to be active via HAAdmin (or start-dfs.sh could do that 
automatically, ie assume the NN it starts should be the primary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2733) Document HA configuration and CLI

2011-12-29 Thread Eli Collins (Created) (JIRA)
Document HA configuration and CLI
-

 Key: HDFS-2733
 URL: https://issues.apache.org/jira/browse/HDFS-2733
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation, ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


We need to document the configuration changes in HDFS-2231 and the new CLI 
introduced by HADOOP-7774.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2732) Add support for the standby in the bin scripts

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177479#comment-13177479
 ] 

Todd Lipcon commented on HDFS-2732:
---

For me, start-dfs.sh actually already works, since it uses the GetConf tool 
which prints out all of the NN addresses in the cluster based on the 
configuration. Does it not work for you?

 Add support for the standby in the bin scripts
 --

 Key: HDFS-2732
 URL: https://issues.apache.org/jira/browse/HDFS-2732
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins

 We need to update the bin scripts to support SBNs. Two ideas:
 Modify start-dfs.sh to start another copy of the NN if HA is configured. We 
 could introduce a file similar to masters (2NN hosts) called standbys which 
 lists the SBN hosts, and start-dfs.sh would automatically make the NN it 
 starts active (and leave the NNs listed in standby as is).
 Or simpler, we could just provide a start-namenode.sh script that a user can 
 run to start the SBN on another host themselves. The user would manually tell 
 the other NN to be active via HAAdmin (or start-dfs.sh could do that 
 automatically, ie assume the NN it starts should be the primary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177478#comment-13177478
 ] 

Todd Lipcon commented on HDFS-2731:
---

bq. as an optimization it could copy the logs from the shared dir
I dont think it's necessarily an optimization - might actually be _easier_ to 
implement this way :)

bq. If the other NN is still in standby then it should fail to start as it does 
currently
Can you explain what you mean by this? Why not allow it to download the image 
from the other NN anyway?

 Autopopulate standby name dirs if they're empty
 ---

 Key: HDFS-2731
 URL: https://issues.apache.org/jira/browse/HDFS-2731
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins

 To setup a SBN we currently format the primary then manually copy the name 
 dirs to the SBN. The SBN should do this automatically. Specifically, on NN 
 startup, if HA with a shared edits dir is configured and populated, if the 
 SBN has empty name dirs it should downloads the image and log from the 
 primary (as an optimization it could copy the logs from the shared dir). If 
 the other NN is still in standby then it should fail to start as it does 
 currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer

2011-12-29 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2709:
-

Attachment: HDFS-2709-HDFS-1623.patch

 HA: Appropriately handle error conditions in EditLogTailer
 --

 Key: HDFS-2709
 URL: https://issues.apache.org/jira/browse/HDFS-2709
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
Priority: Critical
 Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, 
 HDFS-2709-HDFS-1623.patch


 Currently if the edit log tailer experiences an error replaying edits in the 
 middle of a file, it will go back to retrying from the beginning of the file 
 on the next tailing iteration. This is incorrect since many of the edits will 
 have already been replayed, and not all edits are idempotent.
 Instead, we either need to (a) support reading from the middle of a finalized 
 file (ie skip those edits already applied), or (b) abort the standby if it 
 hits an error while tailing. If a isn't simple, let's do b for now and 
 come back to 'a' later since this is a rare circumstance and better to abort 
 than be incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer

2011-12-29 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177483#comment-13177483
 ] 

Aaron T. Myers commented on HDFS-2709:
--

Thanks a lot for the thorough review, Eli. Comments inline. I also found and 
fixed another little bug involving a potential race between the edits log 
tailer thread and rolling edits logs. I'll post an updated patch in a moment.

bq. This change handles errors reading an edit from the log (the common case) 
but not when there's a failure to apply an edit (eg if there was a bug, or a 
silent corruption somehow went unnoticed). While loadEdits won't ignore (will 
throw) this exception it does get propagated up to the catch of Throwable in 
EditLogTailer#run so we effectively retry endlessly in this case. Need to 
replace the TODO(HA) comment there with code to shutdown the SBN. Feel free to 
punt to another jira.

Indeed, I had originally intended to do this as part of a separate JIRA, but 
I'm rethinking that decision. I've added some code to shutdown the SBN, and 
amended the tests to verify this behavior.

bq. How about adding a test that uses multiple shared edits dirs, and shows 
that a failure to read from one of them will cause the tailer to not catch up, 
can file a jira for a future change that is OK with faulty shared dirs as long 
as one is working.

Multiple shared edits dirs isn't currently supported or tested. It's certainly 
an obvious improvement worth doing, but there are currently no tests for it. We 
should probably file a JIRA to test that.

bq. In FileJournalManager#getNumberOfTransactions, not that the we loosen the 
check to elf.containsTxId(fromTxid) isn't the last else case dead code?

Yes indeed, not sure how I missed that. Removed.

bq. I think we can remove the TODO(HA): Should this happen when called by the 
tailer? comment in loadEdits right since we always create new streams when we 
select them?

Yes indeed. Removed.

bq. Would it be simpler in LimitedEditLogAnswer#answer to spy on each stream 
and stub readOp rather than introduce LimitedEditLogInputStream?

Different? Yes. Simpler? Maybe. I did it this way because I thought creating 
spies within spies was kind of gross. I switched it to use a spy in this latest 
patch, which is at least less code. :)

bq. How about introducing DFSHATestUtil and put waitForStandbyToCatchUp and 
CouldNotCatchUpException there? Seems like the methods you pointed out in the 
HDFS-2692 review could go there as well).

Good idea. Let's do it in a separate JIRA though, along the lines of 
consolidate generic HA test helper methods.

bq. Nit: IOException e, s/e/ioe/

Done.

bq. testFailuretoReadEdits needs a javadoc

Done.

bq. waitForStandbyToCatchUp needs a javadoc indicating it waits for 
NN_LAG_TIMEOUT then throws CouldNotCatchUp

Done.

 HA: Appropriately handle error conditions in EditLogTailer
 --

 Key: HDFS-2709
 URL: https://issues.apache.org/jira/browse/HDFS-2709
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
Priority: Critical
 Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, 
 HDFS-2709-HDFS-1623.patch


 Currently if the edit log tailer experiences an error replaying edits in the 
 middle of a file, it will go back to retrying from the beginning of the file 
 on the next tailing iteration. This is incorrect since many of the edits will 
 have already been replayed, and not all edits are idempotent.
 Instead, we either need to (a) support reading from the middle of a finalized 
 file (ie skip those edits already applied), or (b) abort the standby if it 
 hits an error while tailing. If a isn't simple, let's do b for now and 
 come back to 'a' later since this is a rare circumstance and better to abort 
 than be incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177486#comment-13177486
 ] 

Eli Collins commented on HDFS-2731:
---

Wrt #1, if we get the image from the NN and the edits from the shared dir, are 
we sure they'll always match, eg what if we're rolling at the same time (the 
other NN could be primary and active)? I was thinking asking for both from the 
primary would mean we always get matched sets and therefore don't need to worry 
about races.

Wrt #2, yea, I thinking we should be explicit (don't have to worry about eg the 
shared dir being populated by neither NN having populated name dirs, which we 
know won't be the case if the other is active), but on 2nd thought I think your 
suggestion is better.

 Autopopulate standby name dirs if they're empty
 ---

 Key: HDFS-2731
 URL: https://issues.apache.org/jira/browse/HDFS-2731
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins

 To setup a SBN we currently format the primary then manually copy the name 
 dirs to the SBN. The SBN should do this automatically. Specifically, on NN 
 startup, if HA with a shared edits dir is configured and populated, if the 
 SBN has empty name dirs it should downloads the image and log from the 
 primary (as an optimization it could copy the logs from the shared dir). If 
 the other NN is still in standby then it should fail to start as it does 
 currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177488#comment-13177488
 ] 

Todd Lipcon commented on HDFS-2731:
---

The primary shouldn't be removing any old images unless it's taking 
checkpoints. But there won't be checkpoints if the standby isn't running yet 
(assuming the standby is the one doing checkpointing). So if we get the most 
recent image from the NN, then we should always have enough edits in the shared 
dir to roll forward from there.

 Autopopulate standby name dirs if they're empty
 ---

 Key: HDFS-2731
 URL: https://issues.apache.org/jira/browse/HDFS-2731
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins

 To setup a SBN we currently format the primary then manually copy the name 
 dirs to the SBN. The SBN should do this automatically. Specifically, on NN 
 startup, if HA with a shared edits dir is configured and populated, if the 
 SBN has empty name dirs it should downloads the image and log from the 
 primary (as an optimization it could copy the logs from the shared dir). If 
 the other NN is still in standby then it should fail to start as it does 
 currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2386) with security enabled fsck calls lead to handshake_failure and hftp fails throwing the same exception in the logs

2011-12-29 Thread Rajesh Balamohan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177534#comment-13177534
 ] 

Rajesh Balamohan commented on HDFS-2386:



we are actively hitting this issue with the secondary namenode and fsck with 
the 204. JDK 1.6.0_29, RHEL 6.1, MIT 1.8.x, AES-256, AES-128, and RC4 enc types 
are enabled. JCE is installed.


+1, We are facing this issue as well and get the following exception in 
NameNode.


11/12/29 18:47:02 WARN mortbay.log: EXCEPTION
javax.net.ssl.SSLHandshakeException: Invalid padding
at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1699)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:852)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1165)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1149)
at 
org.mortbay.jetty.security.SslSocketConnector$SslConnection.run(SslSocketConnector.java:708)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.crypto.BadPaddingException: Padding length invalid: 238
at 
com.sun.net.ssl.internal.ssl.CipherBox.removePadding(CipherBox.java:399)
at com.sun.net.ssl.internal.ssl.CipherBox.decrypt(CipherBox.java:247)
at 
com.sun.net.ssl.internal.ssl.InputRecord.decrypt(InputRecord.java:153)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:840)
... 5 more

Pasting the javax.net.debug output from secondary namenode (if this would be of 
help)

Enabled javax.net.debug=all in secondary namenode and got the following output


Cipher Suite: TLS_KRB5_WITH_3DES_EDE_CBC_SHA
Compression Method: 0
Extension renegotiation_info, renegotiated_connection: empty
***
%% Created:  [Session-1, TLS_KRB5_WITH_3DES_EDE_CBC_SHA]
** TLS_KRB5_WITH_3DES_EDE_CBC_SHA
*** ServerHelloDone
*** ClientKeyExchange, Kerberos
...
...
..

*** Finished
verify_data:  { 190, 127, 20, 131, 10, 136, 84, 207, 172, 130, 31, 53 }
***
main, WRITE: TLSv1 Handshake, length = 40
main, READ: TLSv1 Alert, length = 2
main, RECV TLSv1 ALERT:  fatal, handshake_failure
main, called closeSocket()
main, handling exception: javax.net.ssl.SSLHandshakeException: Received fatal 
alert: handshake_failure
11/12/29 18:47:02 ERROR namenode.SecondaryNameNode: checkpoint: Content-Length 
header is not provided by the namenode when trying to fetch 
https://NN:50475/getimage?getimage=1


 with security enabled fsck calls lead to handshake_failure and hftp fails 
 throwing the same exception in the logs
 -

 Key: HDFS-2386
 URL: https://issues.apache.org/jira/browse/HDFS-2386
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.205.0
Reporter: Arpit Gupta



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1314) dfs.block.size accepts only absolute value

2011-12-29 Thread Harsh J (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-1314:
--

Status: Open  (was: Patch Available)

 dfs.block.size accepts only absolute value
 --

 Key: HDFS-1314
 URL: https://issues.apache.org/jira/browse/HDFS-1314
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Karim Saadah
Assignee: Sho Shimauchi
Priority: Minor
  Labels: newbie
 Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt


 Using dfs.block.size=8388608 works 
 but dfs.block.size=8mb does not.
 Using dfs.block.size=8mb should throw some WARNING on NumberFormatException.
 (http://pastebin.corp.yahoo.com/56129)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2716) HA: Configuration needs to allow different dfs.http.addresses for each HA NN

2011-12-29 Thread Todd Lipcon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2716:
--

Attachment: hdfs-2716.txt

Attached patch fixes the generic conf code to handle NN IDs as well as 
Nameservice IDs.

 HA: Configuration needs to allow different dfs.http.addresses for each HA NN
 

 Key: HDFS-2716
 URL: https://issues.apache.org/jira/browse/HDFS-2716
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-2716.txt


 Earlier on the HA branch we expanded the configuration so that different IPC 
 addresses can be specified for each of the HA NNs in a cluster. But we didn't 
 do this for the HTTP address. This has proved problematic while working on 
 HDFS-2291 (checkpointing in HA).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2291) HA: Checkpointing in an HA setup

2011-12-29 Thread Todd Lipcon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2291:
--

Attachment: hdfs-2291.txt

Attached patch adds a thread to the SBN which takes checkpoints.

It doesn't currently deal with the case where a checkpoint is happening while 
the SBN needs to become active. I'm working on that now, but figured I'd put 
this patch up for early review.

This depends on HDFS-2716.

 HA: Checkpointing in an HA setup
 

 Key: HDFS-2291
 URL: https://issues.apache.org/jira/browse/HDFS-2291
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Aaron T. Myers
Assignee: Todd Lipcon
 Fix For: HA branch (HDFS-1623)

 Attachments: hdfs-2291.txt


 We obviously need to create checkpoints when HA is enabled. One thought is to 
 use a third, dedicated checkpointing node in addition to the active and 
 standby nodes. Another option would be to make the standby capable of also 
 performing the function of checkpointing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2732) Add support for the standby in the bin scripts

2011-12-29 Thread Eli Collins (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-2732.
---

Resolution: Won't Fix

 Add support for the standby in the bin scripts
 --

 Key: HDFS-2732
 URL: https://issues.apache.org/jira/browse/HDFS-2732
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins

 We need to update the bin scripts to support SBNs. Two ideas:
 Modify start-dfs.sh to start another copy of the NN if HA is configured. We 
 could introduce a file similar to masters (2NN hosts) called standbys which 
 lists the SBN hosts, and start-dfs.sh would automatically make the NN it 
 starts active (and leave the NNs listed in standby as is).
 Or simpler, we could just provide a start-namenode.sh script that a user can 
 run to start the SBN on another host themselves. The user would manually tell 
 the other NN to be active via HAAdmin (or start-dfs.sh could do that 
 automatically, ie assume the NN it starts should be the primary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2732) Add support for the standby in the bin scripts

2011-12-29 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177538#comment-13177538
 ] 

Eli Collins commented on HDFS-2732:
---

Good point, I missed that. It doesn't work for me since I'm running both the NN 
and SBN on the same host, so the 2nd fails to start because the pid file 
already exists (other nn already claimed the file). The log dirs would collide 
as well. In any case, I don't think we need to support the NN and SBN on the 
same host in the start scripts, developers can workaround this by changing the 
HADOOP_CONF_DIR and running start-dfs.sh again or start just the NN manually as 
I've been doing with a separate conf dir.

 Add support for the standby in the bin scripts
 --

 Key: HDFS-2732
 URL: https://issues.apache.org/jira/browse/HDFS-2732
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins

 We need to update the bin scripts to support SBNs. Two ideas:
 Modify start-dfs.sh to start another copy of the NN if HA is configured. We 
 could introduce a file similar to masters (2NN hosts) called standbys which 
 lists the SBN hosts, and start-dfs.sh would automatically make the NN it 
 starts active (and leave the NNs listed in standby as is).
 Or simpler, we could just provide a start-namenode.sh script that a user can 
 run to start the SBN on another host themselves. The user would manually tell 
 the other NN to be active via HAAdmin (or start-dfs.sh could do that 
 automatically, ie assume the NN it starts should be the primary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177540#comment-13177540
 ] 

Todd Lipcon commented on HDFS-2709:
---

A few thoughts on the overall approach:
- Rather than modify EditLogFileInputStream to take a startTxId, why not do the 
skipping (what you call {{setInitialPosition}}) from the caller? ie modify 
{{FSEditLogLoader}} to skip the transactions that have already been replayed? 
The skipping code doesn't seem specific to the input stream itself.
- I'm not convinced why we need to have the {{partialLoadOk}} flag in 
{{FSEditLogLoader}}. IMO if the log is truncated, it's still an error as far as 
the loader is concerned - we just want to let the caller continue from where 
the error occured. The only trick is how to go about getting the last 
successfully loaded txid out of the FSEditLogLoader in the error case -- I 
guess a member variable and a getter would work there? Do you think this ends 
up messier than the way you've done it?
- Can we add some non-HA tests that exercise 
FileJournalManager/FSEditLogLoader's ability to start mid-stream? Not sure if 
that's feasible.

 HA: Appropriately handle error conditions in EditLogTailer
 --

 Key: HDFS-2709
 URL: https://issues.apache.org/jira/browse/HDFS-2709
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
Priority: Critical
 Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, 
 HDFS-2709-HDFS-1623.patch


 Currently if the edit log tailer experiences an error replaying edits in the 
 middle of a file, it will go back to retrying from the beginning of the file 
 on the next tailing iteration. This is incorrect since many of the edits will 
 have already been replayed, and not all edits are idempotent.
 Instead, we either need to (a) support reading from the middle of a finalized 
 file (ie skip those edits already applied), or (b) abort the standby if it 
 hits an error while tailing. If a isn't simple, let's do b for now and 
 come back to 'a' later since this is a rare circumstance and better to abort 
 than be incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered

2011-12-29 Thread J.Andreina (Created) (JIRA)
Even if we configure the property fs.checkpoint.size in both core-site.xml and 
hdfs-site.xml  the values are not been considered


 Key: HDFS-2734
 URL: https://issues.apache.org/jira/browse/HDFS-2734
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0, 0.20.1
Reporter: J.Andreina
Priority: Minor


Even if we configure the property fs.checkpoint.size in both core-site.xml and 
hdfs-site.xml  the values are not been considered

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered

2011-12-29 Thread Harsh J (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177580#comment-13177580
 ] 

Harsh J commented on HDFS-2734:
---

Hi J.Andreina,

That property is for upto 0.20/1.0 SecondaryNameNodes. It is OK to be in 
core-site.xml.

What exact version are you reporting this for? What do you see in 
SNN_HOST:50090/conf?

 Even if we configure the property fs.checkpoint.size in both core-site.xml 
 and hdfs-site.xml  the values are not been considered
 

 Key: HDFS-2734
 URL: https://issues.apache.org/jira/browse/HDFS-2734
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.23.0
Reporter: J.Andreina
Priority: Minor

 Even if we configure the property fs.checkpoint.size in both core-site.xml 
 and hdfs-site.xml  the values are not been considered

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira