[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode

2012-04-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259394#comment-13259394
 ] 

dhruba borthakur commented on HDFS-3092:


I am trying to digest the meat of this approach, but one question that I do not 
have an answer for: is it possible for the journal daemon to write data to 
disks and nodes that do not share load from other non-journal writers? I feel 
this requirement will be critical to ensure low variance of write latencies for 
the journal. My experience is that a 5% increase in the latency of writes to 
the transaction log causes a 20% degradation of namenode throughput in a large 
cluster.

 Enable journal protocol based editlog streaming for standby namenode
 

 Key: HDFS-3092
 URL: https://issues.apache.org/jira/browse/HDFS-3092
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, name-node
Affects Versions: 0.24.0, 0.23.3
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: ComparisonofApproachesforHAJournals.pdf, 
 MultipleSharedJournals.pdf, MultipleSharedJournals.pdf, 
 MultipleSharedJournals.pdf


 Currently standby namenode relies on reading shared editlogs to stay current 
 with the active namenode, for namespace changes. BackupNode used streaming 
 edits from active namenode for doing the same. This jira is to explore using 
 journal protocol based editlog streams for the standby namenode. A daemon in 
 standby will get the editlogs from the active and write it to local edits. To 
 begin with, the existing standby mechanism of reading from a file, will 
 continue to be used, instead of from shared edits, from the local edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3307) when save FSImage ,HDFS( or SecondaryNameNode or FSImage)can't handle some file whose file name has some special messy code(乱码)

2012-04-22 Thread yixiaohua (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259396#comment-13259396
 ] 

yixiaohua commented on HDFS-3307:
-

dear  todd:
   乱码 is not the string that causes the problem,it is chinese I don't how do 
describe, I has place the string that causes the  problem and the test code  in 
attachments . wish for your reply,  best wishes!

 when save FSImage  ,HDFS( or  SecondaryNameNode or FSImage)can't handle some 
 file whose file name has some special messy code(乱码)
 -

 Key: HDFS-3307
 URL: https://issues.apache.org/jira/browse/HDFS-3307
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
 Attachments: FSImage.java, ProblemString.txt, 
 TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, fsDir.rootDir, out);--problem
   fsNamesys.saveFilesUnderConstruction(out);--problem  
 detail is below
   strbuf = null;
 } finally {
   out.close();
 }
 LOG.info(Image file of size  + newFile.length() +  saved in  
 + (FSNamesystem.now() - startTime)/1000 +  seconds.);
   }
  /**
* Save file tree image starting from the 

[jira] [Updated] (HDFS-3307) when save FSImage ,HDFS( or SecondaryNameNode or FSImage)can't handle some file whose file name has some special messy code(乱码)

2012-04-22 Thread yixiaohua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yixiaohua updated HDFS-3307:


Attachment: ProblemString.txt
TestUTF8AndStringGetBytes.java

 when save FSImage  ,HDFS( or  SecondaryNameNode or FSImage)can't handle some 
 file whose file name has some special messy code(乱码)
 -

 Key: HDFS-3307
 URL: https://issues.apache.org/jira/browse/HDFS-3307
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
 Attachments: FSImage.java, ProblemString.txt, 
 TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, fsDir.rootDir, out);--problem
   fsNamesys.saveFilesUnderConstruction(out);--problem  
 detail is below
   strbuf = null;
 } finally {
   out.close();
 }
 LOG.info(Image file of size  + newFile.length() +  saved in  
 + (FSNamesystem.now() - startTime)/1000 +  seconds.);
   }
  /**
* Save file tree image starting from the given root.
* This is a recursive procedure, which first saves all children of
* a current directory and then moves inside the sub-directories.
*/
   private static void saveImage(ByteBuffer 

[jira] [Updated] (HDFS-3307) when save FSImage ,HDFS( or SecondaryNameNode or FSImage)can't handle some file whose file name has some special messy code(乱码)

2012-04-22 Thread yixiaohua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yixiaohua updated HDFS-3307:


Attachment: TestUTF8AndStringGetBytes.java

 when save FSImage  ,HDFS( or  SecondaryNameNode or FSImage)can't handle some 
 file whose file name has some special messy code(乱码)
 -

 Key: HDFS-3307
 URL: https://issues.apache.org/jira/browse/HDFS-3307
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
 Attachments: FSImage.java, ProblemString.txt, 
 TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, fsDir.rootDir, out);--problem
   fsNamesys.saveFilesUnderConstruction(out);--problem  
 detail is below
   strbuf = null;
 } finally {
   out.close();
 }
 LOG.info(Image file of size  + newFile.length() +  saved in  
 + (FSNamesystem.now() - startTime)/1000 +  seconds.);
   }
  /**
* Save file tree image starting from the given root.
* This is a recursive procedure, which first saves all children of
* a current directory and then moves inside the sub-directories.
*/
   private static void saveImage(ByteBuffer 

[jira] [Commented] (HDFS-3307) when save FSImage ,HDFS( or SecondaryNameNode or FSImage)can't handle some file whose file name has some special messy code(乱码)

2012-04-22 Thread yixiaohua (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259400#comment-13259400
 ] 

yixiaohua commented on HDFS-3307:
-

I am try to figure out the problem of UTF8 ~_~

 when save FSImage  ,HDFS( or  SecondaryNameNode or FSImage)can't handle some 
 file whose file name has some special messy code(乱码)
 -

 Key: HDFS-3307
 URL: https://issues.apache.org/jira/browse/HDFS-3307
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
 Attachments: FSImage.java, ProblemString.txt, 
 TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, fsDir.rootDir, out);--problem
   fsNamesys.saveFilesUnderConstruction(out);--problem  
 detail is below
   strbuf = null;
 } finally {
   out.close();
 }
 LOG.info(Image file of size  + newFile.length() +  saved in  
 + (FSNamesystem.now() - startTime)/1000 +  seconds.);
   }
  /**
* Save file tree image starting from the given root.
* This is a recursive procedure, which first saves all children of
* a current directory and then moves inside the