[jira] [Commented] (HDFS-3307) when save FSImage ,HDFS( or SecondaryNameNode or FSImage)can't handle some file whose file name has some special messy code(乱码)

2012-04-20 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258331#comment-13258331
 ] 

Todd Lipcon commented on HDFS-3307:
---

Rather than change the code to not use UTF8, I think we should figure out why 
the UTF8 writeString function is writing the wrong data. Is 乱码 the string 
that causes the problem? I tried to reproduce using this string, but it works 
fine here.

(I did hadoop fs -put /etc/issue '乱码', then successfully restarted and catted 
the file)

 when save FSImage  ,HDFS( or  SecondaryNameNode or FSImage)can't handle some 
 file whose file name has some special messy code(乱码)
 -

 Key: HDFS-3307
 URL: https://issues.apache.org/jira/browse/HDFS-3307
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
 Attachments: FSImage.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, fsDir.rootDir, out);--problem
   fsNamesys.saveFilesUnderConstruction(out);--problem  
 detail is below
   strbuf = null;
 } finally {
   out.close();
 }
 LOG.info(Image file of size  + newFile.length() +  saved in  
 + (FSNamesystem.now() - startTime)/1000 +  

[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode

2012-04-20 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258512#comment-13258512
 ] 

Todd Lipcon commented on HDFS-3092:
---

Can you clarify a few things in this document?

- In ParallelWritesWithBarrier, what happens to the journals which 
timeout/fail? It seems you need to mark them as failed in ZK or something in 
order to be correct. But if you do that, why do you need Q to be a quorum? 
Q=1 should suffice for correctness, and Q=2 should suffice in order to always 
be available to recover.

It seems the protocol should be closer to:
1) send out write request to all active JNs
2) wait until all respond, or a configurable timeout
3) any that do not respond are marked as failed in ZK
4) If the remaining number of JNs is sufficient (I'd guess 2) then succeed the 
write. Otherwise fail the write and abort.

The recovery protocol here is also a little tricky. I haven't seen a 
description of the specifics - there are a number of cases to handle - eg even 
if a write appears to fail from the perspective of the writer, it may have 
actually succeeded. Another situation: what happens if the writer crashes 
between step 2 and step 3 (so the JNs have differing number of txns, but ZK 
indicates they're all up to date?) 


Regarding quorum commits:
bq. b. The journal set is fixed in the config. Hard to add/replace hardware.
There are protocols that could be used to change the quorum size/membership at 
runtime. They do add complexity, though, so I think they should be seen as a 
future improvement - but not be discounted as impossible.
Another point is that hardware replacement can easily be treated the same as a 
full crash and loss of disk. If one node completely crashes, a new node could 
be brought in with the same hostname with no complicated protocols.
Adding or removing nodes shouldn't be hard to support during a downtime window, 
which I think satisfies most use cases pretty well.


Regarding bookkeeper:
- other operational concerns aren't mentioned: eg it doesn't use Hadoop 
metrics, doesn't use the same style of configuration files, daemon scripts, 
etc. 

 Enable journal protocol based editlog streaming for standby namenode
 

 Key: HDFS-3092
 URL: https://issues.apache.org/jira/browse/HDFS-3092
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, name-node
Affects Versions: 0.24.0, 0.23.3
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: ComparisonofApproachesforHAJournals.pdf, 
 MultipleSharedJournals.pdf, MultipleSharedJournals.pdf, 
 MultipleSharedJournals.pdf


 Currently standby namenode relies on reading shared editlogs to stay current 
 with the active namenode, for namespace changes. BackupNode used streaming 
 edits from active namenode for doing the same. This jira is to explore using 
 journal protocol based editlog streams for the standby namenode. A daemon in 
 standby will get the editlogs from the active and write it to local edits. To 
 begin with, the existing standby mechanism of reading from a file, will 
 continue to be used, instead of from shared edits, from the local edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3305) GetImageServlet should considered SBN a valid requestor in a secure HA setup

2012-04-19 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257708#comment-13257708
 ] 

Todd Lipcon commented on HDFS-3305:
---

+1 pending jenkins

 GetImageServlet should considered SBN a valid requestor in a secure HA setup
 

 Key: HDFS-3305
 URL: https://issues.apache.org/jira/browse/HDFS-3305
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, name-node
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3305.patch


 Right now only the NN and 2NN are considered valid requestors. This won't 
 work if the ANN and SBN use distinct principal names.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3271) src/fuse_users.c: use re-entrant versions of getpwuid, getgid, etc

2012-04-18 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256802#comment-13256802
 ] 

Todd Lipcon commented on HDFS-3271:
---

It's not a bug in the library, but rather a bug in one of the nss backends 
(sssd). Plus, that would require re-building the native libs on every different 
version of EL6, whereas right now a single binary works against any EL6 release.

 src/fuse_users.c: use re-entrant versions of getpwuid, getgid, etc
 --

 Key: HDFS-3271
 URL: https://issues.apache.org/jira/browse/HDFS-3271
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor

 Use the re-entrant versions of these functions rather than using locking

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3290) Use a better local directory layout for the datanode

2012-04-18 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256999#comment-13256999
 ] 

Todd Lipcon commented on HDFS-3290:
---

It doesn't do a search for a block. The DN keeps the block map in memory.

But, I do think this is a good idea, as it will make it easier in the future to 
avoid having to keep the block map in memory on the DNs.

 Use a better local directory layout for the datanode
 

 Key: HDFS-3290
 URL: https://issues.apache.org/jira/browse/HDFS-3290
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor

 When the HDFS DataNode stores chunks in a local directory, it currently puts 
 all of the chunk files into either one big directory, or a collection of 
 directories.  However, there is no way to know which directory a given block 
 will end up in, given its ID.  As the number of files increases, this does 
 not scale well.
 Similar to the git version control system, HDFS should create a few different 
 top level directories keyed off of a few bits in the chunk ID.  Git uses 8 
 bits.  This substantially cuts down on the number of chunk files in the same 
 directory and gives increased performance, while not compromising O(1) lookup 
 of chunks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3271) src/fuse_users.c: use re-entrant versions of getpwuid, getgid, etc

2012-04-17 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255347#comment-13255347
 ] 

Todd Lipcon commented on HDFS-3271:
---

bq. Or test for POSIX compliance at configure time...

Testing for presence of a race is tricky.

bq. (FWIW, I'm still mostly convinced that this ugly hack was a waste of time 
for the majority of folks.)

Sure, but for the folks who needed it, it saved their clusters from constant 
segfaults.

 src/fuse_users.c: use re-entrant versions of getpwuid, getgid, etc
 --

 Key: HDFS-3271
 URL: https://issues.apache.org/jira/browse/HDFS-3271
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor

 Use the re-entrant versions of these functions rather than using locking

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2631) Rewrite fuse-dfs to use the webhdfs protocol

2012-04-17 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255350#comment-13255350
 ] 

Todd Lipcon commented on HDFS-2631:
---

That seems reasonable. I think it's a given that we need to keep the original 
libhdfs for performance. Having a libhdfs-alike that goes over HTTP seems 
reasonable enough but not always preferable. To speak to each of the original 
points:


bq. Compatibility - allows a single fuse client to work across server versions

We need to address compatibility for clients in general. Our Java client (and 
hence libhdfs) need this just as much as fuse.

bq. Works with both WebHDFS and Hoop since they are protocol compatible

I guess this is an advantage, but given that libhdfs already wraps arbitrary 
hadoop filesystems, we already have this capability.

bq. Removes the overhead related to libhdfs (forking a jvm)

fuse is a long-running client, so the fork overhead seems minimal. Recent 
improvements in libhdfs have also cut out most of the copying overhead.

bq. Makes it easier to support features like security

Perhaps - but libhdfs needs security anyway, so I don't think it buys us much.

 Rewrite fuse-dfs to use the webhdfs protocol
 

 Key: HDFS-2631
 URL: https://issues.apache.org/jira/browse/HDFS-2631
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/fuse-dfs
Reporter: Eli Collins
Assignee: Jaimin D Jetly

 We should port the implementation of fuse-dfs to use the webhdfs protocol. 
 This has a number of benefits:
 * Compatibility - allows a single fuse client to work across server versions
 * Works with both WebHDFS and Hoop since they are protocol compatible
 * Removes the overhead related to libhdfs (forking a jvm)
 * Makes it easier to support features like security

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3285) Null pointer execption at ClientNamenodeProtocolTranslatorPB while running fetchdt

2012-04-17 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255599#comment-13255599
 ] 

Todd Lipcon commented on HDFS-3285:
---

Dup of HDFS-2956?

 Null pointer execption at ClientNamenodeProtocolTranslatorPB while running 
 fetchdt 
 ---

 Key: HDFS-3285
 URL: https://issues.apache.org/jira/browse/HDFS-3285
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.0.0, 3.0.0


 Scenario:
 
 Run following command
 ./hdfs fetchdt http://**:50070 then I am getting following nullpointer 
 execption
 {noformat}
 Exception in thread main java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:771)
   at 
 org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:650)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:766)
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher$1.run(DelegationTokenFetcher.java:191)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.main(DelegationTokenFetcher.java:144)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3268) Hdfs mishandles token service incompatible with HA

2012-04-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254916#comment-13254916
 ] 

Todd Lipcon commented on HDFS-3268:
---

+1, will commit this momentarily. Thanks, Daryn.

 Hdfs mishandles token service  incompatible with HA
 

 Key: HDFS-3268
 URL: https://issues.apache.org/jira/browse/HDFS-3268
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs client
Affects Versions: 0.24.0, 2.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-3268-1.patch, HDFS-3268.patch


 The {{Hdfs AbstractFileSystem}} is overwriting the token service set by the 
 {{DFSClient}}.  The service is not necessarily the correct one since 
 {{DFSClient}} is responsible for the service.  Most importantly, this 
 improper behavior is overwriting the HA logical service which indirectly 
 renders {{FileContext}} incompatible with HA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3284) bootstrapStandby fails in secure cluster

2012-04-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254934#comment-13254934
 ] 

Todd Lipcon commented on HDFS-3284:
---

ah, the addSecurityConfiguration function is only on the auto-HA branch. Let me 
pull that into this patch as well.

 bootstrapStandby fails in secure cluster
 

 Key: HDFS-3284
 URL: https://issues.apache.org/jira/browse/HDFS-3284
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-3284.txt


 HDFS-3247 improved bootstrapStandby to check if the other NN is in active 
 state before trying to bootstrap. But, it forgot to set up the kerberos 
 principals in the config before doing so. So, bootstrapStandby now fails with 
 Failed to specify server's Kerberos principal name in a secure cluster. 
 (Credit to Stephen Chu for finding this)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3282) Expose getFileLength API.

2012-04-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254963#comment-13254963
 ] 

Todd Lipcon commented on HDFS-3282:
---

Nicholas: despite us advertising DFSDataInputStream as a private API, I imagine 
this change would break people. Could we instead just add a new interface which 
would be implemented by the existing class?

 Expose getFileLength API.
 -

 Key: HDFS-3282
 URL: https://issues.apache.org/jira/browse/HDFS-3282
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs client
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G

 This JIRA is to expose the getFileLength API through a new public 
 DistributedFileSystemInfo class.
 I would appreciate if someone suggest good name for this public class.
 Nicholas, did you plan any special design for this public client class?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3161) 20 Append: Excluded DN replica from recovery should be removed from DN.

2012-04-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255026#comment-13255026
 ] 

Todd Lipcon commented on HDFS-3161:
---

Hi Uma/Vinay.

I ran into an issue like this without use of append():

- Client writing blk_N_GS1 to DN1, DN9, DN10
- Pipeline failed. commitBlockSynchronization succeeded with DN9 and DN10, sets 
gs to blk_N_GS2
- Client closes the pipeline
- NN issues replication request of blk_N_GS2 from DN9 to DN1
- DN1 already has blk_N_GS1 in its ongoingCreates map

I'm not sure if this can cause any serious issue with the block (it didn't in 
my case), but I agree that, if a replication request happens for a block with a 
higher genstamp, it should interrupt the old block's ongoingCreate. If the 
replication request is a lower genstamp, it should be ignored.

 20 Append: Excluded DN replica from recovery should be removed from DN.
 ---

 Key: HDFS-3161
 URL: https://issues.apache.org/jira/browse/HDFS-3161
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: suja s
Priority: Critical
 Fix For: 1.0.3


 1) DN1-DN2-DN3 are in pipeline.
 2) Client killed abruptly
 3) one DN has restarted , say DN3
 4) In DN3 info.wasRecoveredOnStartup() will be true
 5) NN recovery triggered, DN3 skipped from recovery due to above check.
 6) Now DN1, DN2 has blocks with generataion stamp 2 and DN3 has older 
 generation stamp say 1 and also DN3 still has this block entry in 
 ongoingCreates
 7) as part of recovery file has closed and got only two live replicas ( from 
 DN1 and DN2)
 8) So, NN issued the command for replication. Now DN3 also has the replica 
 with newer generation stamp.
 9) Now DN3 contains 2 replicas on disk. and one entry in ongoing creates with 
 referring to blocksBeingWritten directory.
 When we call append/ leaseRecovery, it may again skip this node for that 
 recovery as blockId entry still presents in ongoingCreates with startup 
 recovery true.
 It may keep continue this dance for evry recovery.
 And this stale replica will not be cleaned untill we restart the cluster. 
 Actual replica will be trasferred to this node only through replication 
 process.
 Also unnecessarily that replicated blocks will get invalidated after next 
 recoveries

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3284) bootstrapStandby fails in secure cluster

2012-04-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255101#comment-13255101
 ] 

Todd Lipcon commented on HDFS-3284:
---

The test failure is unrelated (this patch doesn't touch that area of the code)

 bootstrapStandby fails in secure cluster
 

 Key: HDFS-3284
 URL: https://issues.apache.org/jira/browse/HDFS-3284
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, security
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: hdfs-3284.txt, hdfs-3284.txt


 HDFS-3247 improved bootstrapStandby to check if the other NN is in active 
 state before trying to bootstrap. But, it forgot to set up the kerberos 
 principals in the config before doing so. So, bootstrapStandby now fails with 
 Failed to specify server's Kerberos principal name in a secure cluster. 
 (Credit to Stephen Chu for finding this)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2631) Rewrite fuse-dfs to use the webhdfs protocol

2012-04-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255255#comment-13255255
 ] 

Todd Lipcon commented on HDFS-2631:
---

I'm a little confused: why is this a good idea? Seems like it's likely to end 
up much slower than the current implementation. I'd prefer it as another 
option, rather than a rewrite.

 Rewrite fuse-dfs to use the webhdfs protocol
 

 Key: HDFS-2631
 URL: https://issues.apache.org/jira/browse/HDFS-2631
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/fuse-dfs
Reporter: Eli Collins
Assignee: Jaimin D Jetly

 We should port the implementation of fuse-dfs to use the webhdfs protocol. 
 This has a number of benefits:
 * Compatibility - allows a single fuse client to work across server versions
 * Works with both WebHDFS and Hoop since they are protocol compatible
 * Removes the overhead related to libhdfs (forking a jvm)
 * Makes it easier to support features like security

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3042) Automatic failover support for NN HA

2012-04-13 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253883#comment-13253883
 ] 

Todd Lipcon commented on HDFS-3042:
---

Sanjay: this is being done in a branch... it's in branches/HDFS-3042 in SVN.

 Automatic failover support for NN HA
 

 Key: HDFS-3042
 URL: https://issues.apache.org/jira/browse/HDFS-3042
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 HDFS-1623 was the umbrella task for implementation of NN HA capabilities. 
 However, it only focused on manually-triggered failover.
 Given that the HDFS-1623 branch will be merged shortly, I'm opening this JIRA 
 to consolidate/track subtasks for automatic failover support and related 
 improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2708) Stats for the # of blocks per DN

2012-04-13 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253903#comment-13253903
 ] 

Todd Lipcon commented on HDFS-2708:
---

+1 pending jenkins

 Stats for the # of blocks per DN
 

 Key: HDFS-2708
 URL: https://issues.apache.org/jira/browse/HDFS-2708
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
Priority: Minor
 Attachments: HDFS-2708.patch


 It would be useful for tools to be able to retrieve the total number of 
 blocks on each datanode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3280) DFSOutputStream.sync should not be synchronized

2012-04-13 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253941#comment-13253941
 ] 

Todd Lipcon commented on HDFS-3280:
---

Verified that this increased my benchmark performance by a factor of two.

 DFSOutputStream.sync should not be synchronized
 ---

 Key: HDFS-3280
 URL: https://issues.apache.org/jira/browse/HDFS-3280
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Attachments: hdfs-3280.txt


 HDFS-895 added an optimization to make hflush() much faster by 
 unsynchronizing it. But, we forgot to un-synchronize the deprecated 
 {{sync()}} wrapper method. This makes the HBase WAL really slow on 0.23+ 
 since it doesn't take advantage of HDFS-895 anymore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3280) DFSOutputStream.sync should not be synchronized

2012-04-13 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253951#comment-13253951
 ] 

Todd Lipcon commented on HDFS-3280:
---

bq. Ah, so this explains what you guys thought might be an interaction with 
Nagle?

Yep, turned out to be much simpler :)

The patch failed on Hudson due to HDFS-3034 having removed the deprecated 
method. I'll commit this based on Aaron's +1 and based on my manual stress 
testing using HBase's HLog class, which uses this method.

No unit tests since it's hard to unit test for performance, and the hflush 
equivalent is already tested by TestMultithreadedHflush

 DFSOutputStream.sync should not be synchronized
 ---

 Key: HDFS-3280
 URL: https://issues.apache.org/jira/browse/HDFS-3280
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Attachments: hdfs-3280.txt


 HDFS-895 added an optimization to make hflush() much faster by 
 unsynchronizing it. But, we forgot to un-synchronize the deprecated 
 {{sync()}} wrapper method. This makes the HBase WAL really slow on 0.23+ 
 since it doesn't take advantage of HDFS-895 anymore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252240#comment-13252240
 ] 

Todd Lipcon commented on HDFS-3256:
---

I think there's an issue here with safemode's delayed initialization of repl 
queues. As the DNs are checking in, when the cluster transitions from 
single-rack to multi-rack, it will call processMisReplicatedBlocks, even if the 
threshold (DFS_NAMENODE_REPL_QUEUE_THRESHOLD_PCT_KEY) hasn't been crossed. I 
think you need to somehow tie this back to the flag in safemode which 
determines whether misreplicated blocks have been processed yet.

 HDFS considers blocks under-replicated if topology script is configured with 
 only 1 rack
 

 Key: HDFS-3256
 URL: https://issues.apache.org/jira/browse/HDFS-3256
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3256.patch, HDFS-3256.patch


 HDFS treats the mere presence of a topology script being configured as 
 evidence that there are multiple racks. If there is in fact only a single 
 rack, the NN will try to place the blocks on at least two racks, and thus 
 blocks will be considered to be under-replicated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252560#comment-13252560
 ] 

Todd Lipcon commented on HDFS-3256:
---

Is this check actually correct? I think you need to check whether the repl 
queues are initialized, explicitly. For example, you could enter manual safe 
mode, in which case the repl queues are still being tracked, and it's incorrect 
to not call processMisReplicatedBlocks.

 HDFS considers blocks under-replicated if topology script is configured with 
 only 1 rack
 

 Key: HDFS-3256
 URL: https://issues.apache.org/jira/browse/HDFS-3256
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch


 HDFS treats the mere presence of a topology script being configured as 
 evidence that there are multiple racks. If there is in fact only a single 
 rack, the NN will try to place the blocks on at least two racks, and thus 
 blocks will be considered to be under-replicated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3255) HA DFS returns wrong token service

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252563#comment-13252563
 ] 

Todd Lipcon commented on HDFS-3255:
---

Hm, I'm just surprised, because it seemed to work when we tested running MR 
against an HA cluster. But maybe there's some bug that happens when it comes 
time to renew the token, or something?

bq. I can see if I can use Jitendra's feature to enable security for a unit 
test if you'd like.
When I tried to use that feature, I couldn't get it to work. Maybe you'll have 
better luck?

If you can describe a manual test scenario that seems good enough for me.

 HA DFS returns wrong token service
 --

 Key: HDFS-3255
 URL: https://issues.apache.org/jira/browse/HDFS-3255
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs client
Affects Versions: 2.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-3255.patch


 {{fs.getCanonicalService()}} must be equal to 
 {{fs.getDelegationToken(renewer).getService()}}.  When HA is enabled, the DFS 
 token's service is a logical uri, but {{dfs.getCanonicalService()}} is only 
 returning the hostname of the logical uri.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252614#comment-13252614
 ] 

Todd Lipcon commented on HDFS-3256:
---

patch looks good. One minor comment: can you please add an INFO log saying 
something like: Datanode blah blah joining cluster has expanded a formerly 
single-rack cluster to multi-rack. Re-checking all blocks for replication, 
since they should now be replicated cross-rack

 HDFS considers blocks under-replicated if topology script is configured with 
 only 1 rack
 

 Key: HDFS-3256
 URL: https://issues.apache.org/jira/browse/HDFS-3256
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch, 
 HDFS-3256.patch


 HDFS treats the mere presence of a topology script being configured as 
 evidence that there are multiple racks. If there is in fact only a single 
 rack, the NN will try to place the blocks on at least two racks, and thus 
 blocks will be considered to be under-replicated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3259) NameNode#initializeSharedEdits should populate shared edits dir with edit log segments

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252660#comment-13252660
 ] 

Todd Lipcon commented on HDFS-3259:
---

A few issues:
- The file copy you're doing could fail with a half-written file in the middle. 
I think you need to copy to a tmp filename and then rename once the copy is 
done. You could use AtomicFileOutputStream here, actually, since fsyncing it 
seems reasonable.
- Rather than blindly casting, I think it's worth checking instanceof, and 
bailing out with an error if one of the journals isn't a FileJournalManager. 
The error can say that this initialization feature currently only works with 
file-based streams. That's better than an ugly ClassCastException stack trace


 NameNode#initializeSharedEdits should populate shared edits dir with edit log 
 segments
 --

 Key: HDFS-3259
 URL: https://issues.apache.org/jira/browse/HDFS-3259
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, name-node
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3259.patch, HDFS-3259.patch


 Currently initializeSharedEdits formats the shared dir so that subsequent 
 edit log segments will be written there. However, it would be nice to 
 automatically populate this dir with edit log segments with transactions 
 going back to the last fsimage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3255) HA DFS returns wrong token service

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252668#comment-13252668
 ] 

Todd Lipcon commented on HDFS-3255:
---

k, sounds good. +1

 HA DFS returns wrong token service
 --

 Key: HDFS-3255
 URL: https://issues.apache.org/jira/browse/HDFS-3255
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs client
Affects Versions: 2.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-3255.patch


 {{fs.getCanonicalService()}} must be equal to 
 {{fs.getDelegationToken(renewer).getService()}}.  When HA is enabled, the DFS 
 token's service is a logical uri, but {{dfs.getCanonicalService()}} is only 
 returning the hostname of the logical uri.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252677#comment-13252677
 ] 

Todd Lipcon commented on HDFS-3256:
---

bq. not yet processing processing repl queues

Too many processings. You've been hanging out with Eli too much.


Also, I'd say we should just log it at DEBUG level for the Not checking case.

 HDFS considers blocks under-replicated if topology script is configured with 
 only 1 rack
 

 Key: HDFS-3256
 URL: https://issues.apache.org/jira/browse/HDFS-3256
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch, 
 HDFS-3256.patch, HDFS-3256.patch


 HDFS treats the mere presence of a topology script being configured as 
 evidence that there are multiple racks. If there is in fact only a single 
 rack, the NN will try to place the blocks on at least two racks, and thus 
 blocks will be considered to be under-replicated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3267) TestBlocksWithNotEnoughRacks races with DN startup

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252784#comment-13252784
 ] 

Todd Lipcon commented on HDFS-3267:
---

The test can be made to fail by adding a sleep(5000) at the start of 
BPServiceActor's thread.

 TestBlocksWithNotEnoughRacks races with DN startup
 --

 Key: HDFS-3267
 URL: https://issues.apache.org/jira/browse/HDFS-3267
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor

 In TestBlocksWithNotEnoughRacks.testCorruptBlockRereplicatedAcrossRacks, it 
 restarts a DN, and then proceeds to call waitCorruptReplicas. But, because 
 of HDFS-3266, it doesn't actually wait very long while checking for the 
 corrupt block to be reported. Since the DN starts back up asynchronously, the 
 test will fail if it starts too slowly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252788#comment-13252788
 ] 

Todd Lipcon commented on HDFS-3256:
---

+1

 HDFS considers blocks under-replicated if topology script is configured with 
 only 1 rack
 

 Key: HDFS-3256
 URL: https://issues.apache.org/jira/browse/HDFS-3256
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch, 
 HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch


 HDFS treats the mere presence of a topology script being configured as 
 evidence that there are multiple racks. If there is in fact only a single 
 rack, the NN will try to place the blocks on at least two racks, and thus 
 blocks will be considered to be under-replicated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3268) Hdfs mishandles token service incompatible with HA

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252853#comment-13252853
 ] 

Todd Lipcon commented on HDFS-3268:
---

lgtm, except a typo in this comment:
{code}
+   * Get a canonical token service name for this client's tokens.  Null should
+   * tokens if the client is not using tokens.
{code}

 Hdfs mishandles token service  incompatible with HA
 

 Key: HDFS-3268
 URL: https://issues.apache.org/jira/browse/HDFS-3268
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs client
Affects Versions: 0.24.0, 2.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-3268.patch


 The {{Hdfs AbstractFileSystem}} is overwriting the token service set by the 
 {{DFSClient}}.  The service is not necessarily the correct one since 
 {{DFSClient}} is responsible for the service.  Most importantly, this 
 improper behavior is overwriting the HA logical service which indirectly 
 renders {{FileContext}} incompatible with HA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252855#comment-13252855
 ] 

Todd Lipcon commented on HDFS-3092:
---

bq. Perhaps we can get away with this by using some assumptions on timeouts, or 
by additional constraints on the standby. Eg. that it only syncs with finalized 
edit segments.

That's my plan in HDFS-3077, and in fact that's the current behavior of the 
SBN, even when operating on NFS.

 Enable journal protocol based editlog streaming for standby namenode
 

 Key: HDFS-3092
 URL: https://issues.apache.org/jira/browse/HDFS-3092
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, name-node
Affects Versions: 0.24.0, 0.23.3
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: MultipleSharedJournals.pdf, MultipleSharedJournals.pdf, 
 MultipleSharedJournals.pdf


 Currently standby namenode relies on reading shared editlogs to stay current 
 with the active namenode, for namespace changes. BackupNode used streaming 
 edits from active namenode for doing the same. This jira is to explore using 
 journal protocol based editlog streams for the standby namenode. A daemon in 
 standby will get the editlogs from the active and write it to local edits. To 
 begin with, the existing standby mechanism of reading from a file, will 
 continue to be used, instead of from shared edits, from the local edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252923#comment-13252923
 ] 

Todd Lipcon commented on HDFS-3092:
---

bq. By 'my plan' are you referring to an API on the journal node to read latest 
edits that replaces the current standby NN tailing code?

Yep - well, not replaces, but rather just implements the correct APIs in 
JournalManager. We already have read side APIs there to get an input stream 
starting at a given txid. We just need implementations that do the remote 
reads.

 Enable journal protocol based editlog streaming for standby namenode
 

 Key: HDFS-3092
 URL: https://issues.apache.org/jira/browse/HDFS-3092
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, name-node
Affects Versions: 0.24.0, 0.23.3
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: MultipleSharedJournals.pdf, MultipleSharedJournals.pdf, 
 MultipleSharedJournals.pdf


 Currently standby namenode relies on reading shared editlogs to stay current 
 with the active namenode, for namespace changes. BackupNode used streaming 
 edits from active namenode for doing the same. This jira is to explore using 
 journal protocol based editlog streams for the standby namenode. A daemon in 
 standby will get the editlogs from the active and write it to local edits. To 
 begin with, the existing standby mechanism of reading from a file, will 
 continue to be used, instead of from shared edits, from the local edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command

2012-04-11 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251925#comment-13251925
 ] 

Todd Lipcon commented on HDFS-3094:
---

A few nits on the branch-1 patch:

{code}
+  System.err.println(Format aborted as dir + curDir + exits.);
{code}
typo: exists. I would also reformat as: Format aborted:  + curDir +  
exists.

the same typo (exits for exists) is in one of the test cases


{code}
+  boolean isConfirmationNeeded, boolean isInterActive) throws IOException {
{code}
Should be {{isInteractive}} -- not capital 'A'


{code}
+  StartupOption.FORMAT.getName()  + [ + StartupOption.FORCE.getName() +  
+   ] [+StartupOption.NONINTERACTIVE.getName()+] | [ +
{code}
Formatting is off here. When I run it I see:
{code}
Usage: java NameNode [-format[-force ] [-nonInteractive] | [-upgrade] | 
[-rollback] | [-finalize] | [-importCheckpoint]
{code}
should read:
{code}
Usage: java NameNode [-format [-force] [-nonInteractive]] | [-upgrade] | 
[-rollback] | [-finalize] | [-importCheckpoint]
{code}


 add -nonInteractive and -force option to namenode -format command
 -

 Key: HDFS-3094
 URL: https://issues.apache.org/jira/browse/HDFS-3094
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.24.0, 1.0.2
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Fix For: 2.0.0

 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch


 Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup 
 the directories in the local file system.
 -force : namenode formats the directories without prompting
 -nonInterActive : namenode format will return with an exit code of 1 if the 
 dir exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack

2012-04-11 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251958#comment-13251958
 ] 

Todd Lipcon commented on HDFS-3256:
---

Looks good. +1 pending jenkins

 HDFS considers blocks under-replicated if topology script is configured with 
 only 1 rack
 

 Key: HDFS-3256
 URL: https://issues.apache.org/jira/browse/HDFS-3256
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3256.patch


 HDFS treats the mere presence of a topology script being configured as 
 evidence that there are multiple racks. If there is in fact only a single 
 rack, the NN will try to place the blocks on at least two racks, and thus 
 blocks will be considered to be under-replicated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3255) HA DFS returns wrong token service

2012-04-11 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252039#comment-13252039
 ] 

Todd Lipcon commented on HDFS-3255:
---

hey Daryn. Fix looks good. Is there a manual test that be run with MR, for 
example, to verify this as well? i.e how did you discover this issue?

 HA DFS returns wrong token service
 --

 Key: HDFS-3255
 URL: https://issues.apache.org/jira/browse/HDFS-3255
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs client
Affects Versions: 2.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-3255.patch


 {{fs.getCanonicalService()}} must be equal to 
 {{fs.getDelegationToken(renewer).getService()}}.  When HA is enabled, the DFS 
 token's service is a logical uri, but {{dfs.getCanonicalService()}} is only 
 returning the hostname of the logical uri.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3243) TestParallelRead timing out on jenkins

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250803#comment-13250803
 ] 

Todd Lipcon commented on HDFS-3243:
---

Seems like this test was lengthened by HDFS-2834. I'm not sure yet whether it 
represents a performance regression, or if the test itself was just changed in 
such a way that it runs much longer.

On branch-2:
{code}
  testcase time=9.15 classname=org.apache.hadoop.hdfs.TestParallelRead 
name=testParallelRead/
{code}

On trunk:
{code}
  testcase time=23.397 classname=org.apache.hadoop.hdfs.TestParallelRead 
name=testParallelReadCopying/
  testcase time=133.218 classname=org.apache.hadoop.hdfs.TestParallelRead 
name=testParallelReadByteBuffer/
  testcase time=61.364 classname=org.apache.hadoop.hdfs.TestParallelRead 
name=testParallelReadMixed/
{code}

I also see a lot of blocked threads in the jstack on trunk. I asked Henry to 
take a look at this.

 TestParallelRead timing out on jenkins
 --

 Key: HDFS-3243
 URL: https://issues.apache.org/jira/browse/HDFS-3243
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, test
Reporter: Todd Lipcon
Assignee: Henry Robinson

 Trunk builds have been failing recently due to a TestParallelRead timeout. It 
 doesn't report in the Jenkins failure list because surefire handles timeouts 
 really poorly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3243) TestParallelRead timing out on jenkins

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250820#comment-13250820
 ] 

Todd Lipcon commented on HDFS-3243:
---

fwiw I tried copying the old TestParallelRead from branch-2 into trunk, and it 
runs just as fast there as it does in branch-2. So this seems like an issue 
with the new test code, rather than a regression in the read performance of the 
existing path.

 TestParallelRead timing out on jenkins
 --

 Key: HDFS-3243
 URL: https://issues.apache.org/jira/browse/HDFS-3243
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, test
Reporter: Todd Lipcon
Assignee: Henry Robinson

 Trunk builds have been failing recently due to a TestParallelRead timeout. It 
 doesn't report in the Jenkins failure list because surefire handles timeouts 
 really poorly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250954#comment-13250954
 ] 

Todd Lipcon commented on HDFS-2983:
---

How about the following proposal:

- change the check for DN registration so that, if the DN's ctime differs from 
the NN's ctime (i.e the NN has started a snapshot style upgrade), then the 
version check will be strict
- file a follow-up JIRA to add a cluster version summary to the web UI and to 
the NN metrics, allowing ops to monitor whether they have machines that might 
have missed a rolling upgrade

Does that address your concern? I agree with your point that it can be 
confusing to manage, but not sure what the specific change you're asking for is.

 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, 
 HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3245) Add metrics and web UI for cluster version summary

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250998#comment-13250998
 ] 

Todd Lipcon commented on HDFS-3245:
---

Another thing: we already have some per-datanode info in JMX. We should be sure 
to add the registered software version to this info (and to the DFS node list 
pages)

bq. It would be great to have some sort of statistics about the clients as well

I agree it would be nice, but we don't currently send software version strings 
in the IPC handshake or anything. So, I think we should handle it separately 
(this JIRA is just about exposing info we can already easily track)

 Add metrics and web UI for cluster version summary
 --

 Key: HDFS-3245
 URL: https://issues.apache.org/jira/browse/HDFS-3245
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0
Reporter: Todd Lipcon

 With the introduction of protocol compatibility, once HDFS-2983 is committed, 
 we have the possibility that different nodes in a cluster are running 
 different software versions. To aid operators, we should add the ability to 
 summarize the status of versions in the cluster, so they can easily determine 
 whether a rolling upgrade is in progress or if some nodes missed an upgrade 
 (eg maybe they were out of service when the software was updated)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3247) Improve bootstrapStandby behavior when original NN is not active

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251029#comment-13251029
 ] 

Todd Lipcon commented on HDFS-3247:
---

I think we could do one of the following:
1) Improve the error message to note that the admin should make sure the other 
NN is active before proceeding
2) Have it automatically transition it to active if this is the case.

What do you think? I think 1 makes more sense, since the admin has to make an 
explicit decision.

 Improve bootstrapStandby behavior when original NN is not active
 

 Key: HDFS-3247
 URL: https://issues.apache.org/jira/browse/HDFS-3247
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor

 Currently, if you run bootstrapStandby while the first NN is in standby mode, 
 it will spit out an ugly StandbyException with a trace. Instead, it should 
 print an explanation that you should transition the first NN to active before 
 bootstrapping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251118#comment-13251118
 ] 

Todd Lipcon commented on HDFS-3094:
---

+1, I'll commit this momentarily

 add -nonInteractive and -force option to namenode -format command
 -

 Key: HDFS-3094
 URL: https://issues.apache.org/jira/browse/HDFS-3094
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.24.0, 1.0.2
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch


 Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup 
 the directories in the local file system.
 -force : namenode formats the directories without prompting
 -nonInterActive : namenode format will return with an exit code of 1 if the 
 dir exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3248) bootstrapstanby repeated twice in hdfs namenode usage message

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251123#comment-13251123
 ] 

Todd Lipcon commented on HDFS-3248:
---

+1, thanks for fixing this.

 bootstrapstanby repeated twice in hdfs namenode usage message
 -

 Key: HDFS-3248
 URL: https://issues.apache.org/jira/browse/HDFS-3248
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3248.002.patch


 The HDFS usage message repeats bootstrapStandby twice.
 {code}
 Usage: java NameNode [-backup] | [-checkpoint] | [-format[-clusterid cid ]] | 
 [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | 
 [-bootstrapStandby] | [-initializeSharedEdits] | [-bootstrapStandby] | 
 [-recover [ -force ] ]
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3244) Remove dead writable code from hdfs/protocol

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251122#comment-13251122
 ] 

Todd Lipcon commented on HDFS-3244:
---

+1, thanks for the cleanup, glad to be rid of that code.

 Remove dead writable code from hdfs/protocol
 

 Key: HDFS-3244
 URL: https://issues.apache.org/jira/browse/HDFS-3244
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3244.txt


 While doing HDFS-3238 I noticed that there's more dead writable code in 
 hdfs/protocol. Let's remove it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3243) TestParallelRead timing out on jenkins

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251127#comment-13251127
 ] 

Todd Lipcon commented on HDFS-3243:
---

+1, verified in the test results above that TestParallelRead passed relatively 
quickly.

 TestParallelRead timing out on jenkins
 --

 Key: HDFS-3243
 URL: https://issues.apache.org/jira/browse/HDFS-3243
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, test
Reporter: Todd Lipcon
Assignee: Henry Robinson
 Attachments: HDFS-3243.0.patch


 Trunk builds have been failing recently due to a TestParallelRead timeout. It 
 doesn't report in the Jenkins failure list because surefire handles timeouts 
 really poorly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251137#comment-13251137
 ] 

Todd Lipcon commented on HDFS-3246:
---

Agreed -- this would be particularly useful for HBase which does a lot of preads

 pRead equivalent for direct read path
 -

 Key: HDFS-3246
 URL: https://issues.apache.org/jira/browse/HDFS-3246
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Henry Robinson
Assignee: Henry Robinson

 There is no pread equivalent in ByteBufferReadable. We should consider adding 
 one. It would be relatively easy to implement for the distributed case 
 (certainly compared to HDFS-2834), since DFSInputStream does most of the 
 heavy lifting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251150#comment-13251150
 ] 

Todd Lipcon commented on HDFS-2983:
---

Cool, thanks Konstantin. Aaron, does the above proposal sound good to you too? 
Happy to re-review when you update the patch

 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, 
 HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-10 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251339#comment-13251339
 ] 

Todd Lipcon commented on HDFS-2983:
---

+1, reviewed the delta between the latest two patches. Looks good, and nice 
tests.

 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, 
 HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, 
 HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3229) add JournalProtocol RPCs to list finalized edit segments, and read edit segment file from JournalNode.

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249929#comment-13249929
 ] 

Todd Lipcon commented on HDFS-3229:
---

Can you give an example of the subtle issues you're referring to? The advantage 
of re-using HTTP is that we've already tested that code path, and it supports 
things like checksumming, etc.

 add JournalProtocol RPCs to list finalized edit segments, and read edit 
 segment file from JournalNode. 
 ---

 Key: HDFS-3229
 URL: https://issues.apache.org/jira/browse/HDFS-3229
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Brandon Li
Assignee: Brandon Li



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3222) DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249939#comment-13249939
 ] 

Todd Lipcon commented on HDFS-3222:
---

bq. I think our proposal won't work here, because by the time of hsync, DN will 
not report to NN anyway.

On the first hflush() for a block, it calls NN.fsync(), which internally calls 
persistBlocks(). Currently, the fsync call doesn't give a length, but perhaps 
it could?

The other thought is that, after a restart, a block that was previously being 
written would be in the under construction state, but with no expectedTargets. 
This differs from the case where a block has been allocated but not yet written 
to replicas. We could use that to set a new flag in the LocatedBlock response 
indicating that it's not a 0-length, but instead that it's corrupt.


 DFSInputStream#openInfo should not silently get the length as 0 when 
 locations length is zero for last partial block.
 -

 Key: HDFS-3222
 URL: https://issues.apache.org/jira/browse/HDFS-3222
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 1.0.3, 2.0.0, 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-3222-Test.patch


 I have seen one situation with Hbase cluster.
 Scenario is as follows:
 1)1.5 blocks has been written and synced.
 2)Suddenly cluster has been restarted.
 Reader opened the file and trying to get the length., By this time partial 
 block contained DNs are not reported to NN. So, locations for this partial 
 block would be 0. In this case, DFSInputStream assumes that, 1 block size as 
 final size.
 But reader also assuming that, 1 block size is the final length and setting 
 his end marker. Finally reader ending up reading only partial data. Due to 
 this, HMaster could not replay the complete edits. 
 Actually this happend with 20 version. Looking at the code, same should 
 present in trunk as well.
 {code}
 int replicaNotFoundCount = locatedblock.getLocations().length;
 
 for(DatanodeInfo datanode : locatedblock.getLocations()) {
 ..
 ..
  // Namenode told us about these locations, but none know about the replica
 // means that we hit the race between pipeline creation start and end.
 // we require all 3 because some other exception could have happened
 // on a DN that has it.  we want to report that error
 if (replicaNotFoundCount == 0) {
   return 0;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3222) DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250002#comment-13250002
 ] 

Todd Lipcon commented on HDFS-3222:
---

bq. My point is, even though client flushed the data, DNs will not report to NN 
right. Did you check the test above?
Right, but the client reports to the NN. So, the client could report the number 
of bytes hflushed, and the NN could fill in the last block with that 
information when it persists it.

bq. You mean we will retry until we get the locations?
Yea -- treat it the same as we treat a corrupt file.

{quote}
1) client wants to read some partial data which exists in first block itself,
2) open may try to get complete length, and that will block if we retry until 
DNs reports to NN.
3) But really that DNs down for long time.

This time, we can not read even until the specified length, which is less than 
the start offset of partial block.
{quote}

That's true. Is it possible for us to change the client code to defer this code 
path until either (a) the client wants to read from the partial block, or (b) 
the client explictly asks for the file length?

Alternatively, maybe this is so rare that it doesn't matter, and it's OK to 
disallow reading from an unrecovered file whose last block is missing all of 
its block locations after a restart.

 DFSInputStream#openInfo should not silently get the length as 0 when 
 locations length is zero for last partial block.
 -

 Key: HDFS-3222
 URL: https://issues.apache.org/jira/browse/HDFS-3222
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 1.0.3, 2.0.0, 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-3222-Test.patch


 I have seen one situation with Hbase cluster.
 Scenario is as follows:
 1)1.5 blocks has been written and synced.
 2)Suddenly cluster has been restarted.
 Reader opened the file and trying to get the length., By this time partial 
 block contained DNs are not reported to NN. So, locations for this partial 
 block would be 0. In this case, DFSInputStream assumes that, 1 block size as 
 final size.
 But reader also assuming that, 1 block size is the final length and setting 
 his end marker. Finally reader ending up reading only partial data. Due to 
 this, HMaster could not replay the complete edits. 
 Actually this happend with 20 version. Looking at the code, same should 
 present in trunk as well.
 {code}
 int replicaNotFoundCount = locatedblock.getLocations().length;
 
 for(DatanodeInfo datanode : locatedblock.getLocations()) {
 ..
 ..
  // Namenode told us about these locations, but none know about the replica
 // means that we hit the race between pipeline creation start and end.
 // we require all 3 because some other exception could have happened
 // on a DN that has it.  we want to report that error
 if (replicaNotFoundCount == 0) {
   return 0;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3222) DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250032#comment-13250032
 ] 

Todd Lipcon commented on HDFS-3222:
---

bq. It may be difficult for the clients to differ whether this is real 
corruption or it will be recovered after DN reports to NN.

What's the difference? If none of the DNs holding a block have reported a 
replica, it's missing/corrupt. The same is true of finalized blocks - if three 
DNs crash, and we have no replicas anymore, it still might come back if an 
admin fixes one of the DNs.

bq. you mean reader will pass the option? (a) or (b).

Sorry, I wasn't clear. Right now, the behavior is that, when we call open() on 
a file which is under construction, we always go to the DNs holding the last 
block to find the length. My proposal is the following:
- on open(), do not determine the visible length of the file. Set the member 
variable to something like -1 to indicate it's still unknown
- in the code that opens a block reader, change it to check if it's about to 
read from the last block. If it is, try to determine the visible length.
- in the explicit getVisibleLength() call, if it's not determined yet, try to 
determine the visible length

With the above changes, we can allow a client who only wants to access the 
first blocks of a file to do so without having to contact the DNs holding the 
last block. But as soon as the client wants to access the under-construction 
block, or explicitly wants to know the visible length, then we go to the DNs.


 DFSInputStream#openInfo should not silently get the length as 0 when 
 locations length is zero for last partial block.
 -

 Key: HDFS-3222
 URL: https://issues.apache.org/jira/browse/HDFS-3222
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 1.0.3, 2.0.0, 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-3222-Test.patch


 I have seen one situation with Hbase cluster.
 Scenario is as follows:
 1)1.5 blocks has been written and synced.
 2)Suddenly cluster has been restarted.
 Reader opened the file and trying to get the length., By this time partial 
 block contained DNs are not reported to NN. So, locations for this partial 
 block would be 0. In this case, DFSInputStream assumes that, 1 block size as 
 final size.
 But reader also assuming that, 1 block size is the final length and setting 
 his end marker. Finally reader ending up reading only partial data. Due to 
 this, HMaster could not replay the complete edits. 
 Actually this happend with 20 version. Looking at the code, same should 
 present in trunk as well.
 {code}
 int replicaNotFoundCount = locatedblock.getLocations().length;
 
 for(DatanodeInfo datanode : locatedblock.getLocations()) {
 ..
 ..
  // Namenode told us about these locations, but none know about the replica
 // means that we hit the race between pipeline creation start and end.
 // we require all 3 because some other exception could have happened
 // on a DN that has it.  we want to report that error
 if (replicaNotFoundCount == 0) {
   return 0;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250068#comment-13250068
 ] 

Todd Lipcon commented on HDFS-3094:
---

Hi Aprit. The patch looks good now, but it seems to have developed some 
conflicts against trunk

 add -nonInteractive and -force option to namenode -format command
 -

 Key: HDFS-3094
 URL: https://issues.apache.org/jira/browse/HDFS-3094
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.24.0, 1.0.2
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch


 Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup 
 the directories in the local file system.
 -force : namenode formats the directories without prompting
 -nonInterActive : namenode format will return with an exit code of 1 if the 
 dir exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250070#comment-13250070
 ] 

Todd Lipcon commented on HDFS-3094:
---

oops, please excuse my typo of your name, _Arpit_!

 add -nonInteractive and -force option to namenode -format command
 -

 Key: HDFS-3094
 URL: https://issues.apache.org/jira/browse/HDFS-3094
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.24.0, 1.0.2
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch


 Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup 
 the directories in the local file system.
 -force : namenode formats the directories without prompting
 -nonInterActive : namenode format will return with an exit code of 1 if the 
 dir exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3004) Implement Recovery Mode

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250112#comment-13250112
 ] 

Todd Lipcon commented on HDFS-3004:
---

Can you add a release note field for this issue, with brief description of the 
new feature and a pointer to the docs that describe it?

 Implement Recovery Mode
 ---

 Key: HDFS-3004
 URL: https://issues.apache.org/jira/browse/HDFS-3004
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.0.0

 Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, 
 HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, 
 HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, 
 HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, 
 HDFS-3004.023.patch, HDFS-3004.024.patch, HDFS-3004.026.patch, 
 HDFS-3004.027.patch, HDFS-3004.029.patch, HDFS-3004.030.patch, 
 HDFS-3004.031.patch, HDFS-3004.032.patch, HDFS-3004.033.patch, 
 HDFS-3004.034.patch, HDFS-3004.035.patch, HDFS-3004.036.patch, 
 HDFS-3004.037.patch, HDFS-3004.038.patch, HDFS-3004.039.patch, 
 HDFS-3004.040.patch, HDFS-3004.041.patch, HDFS-3004.042.patch, 
 HDFS-3004.042.patch, HDFS-3004.042.patch, HDFS-3004.043.patch, 
 HDFS-3004__namenode_recovery_tool.txt


 When the NameNode metadata is corrupt for some reason, we want to be able to 
 fix it.  Obviously, we would prefer never to get in this case.  In a perfect 
 world, we never would.  However, bad data on disk can happen from time to 
 time, because of hardware errors or misconfigurations.  In the past we have 
 had to correct it manually, which is time-consuming and which can result in 
 downtime.
 Recovery mode is initialized by the system administrator.  When the NameNode 
 starts up in Recovery Mode, it will try to load the FSImage file, apply all 
 the edits from the edits log, and then write out a new image.  Then it will 
 shut down.
 Unlike in the normal startup process, the recovery mode startup process will 
 be interactive.  When the NameNode finds something that is inconsistent, it 
 will prompt the operator as to what it should do.   The operator can also 
 choose to take the first option for all prompts by starting up with the '-f' 
 flag, or typing 'a' at one of the prompts.
 I have reused as much code as possible from the NameNode in this tool.  
 Hopefully, the effort that was spent developing this will also make the 
 NameNode editLog and image processing even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3055) Implement recovery mode for branch-1

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250131#comment-13250131
 ] 

Todd Lipcon commented on HDFS-3055:
---

- can you explain the changes in FSNamesystem.java?
- Can you update the logging in the test cases to use 
StringUtils.stringifyException to match trunk?
- Did you run all the existing tests in branch-1? The one difference that I can 
see that might cause a failure is that the IOException thrown during a failed 
startup used to retain the exception {{t}} as its cause, but no longer does.

Otherwise looks good.


 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.0.0

 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch, 
 HDFS-3055-b1.003.patch, HDFS-3055-b1.004.patch, HDFS-3055-b1.005.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250152#comment-13250152
 ] 

Todd Lipcon commented on HDFS-2983:
---

I did a little investigation to try to answer Konstantin's questions above.

First, I'll summarize our current behavior, verified on 0.23.1 release (I 
didn't understand this thoroughly before trying it out):

- In a running cluster, if you restart the NN without the {{-upgrade}} flag, 
then the DataNodes will happily re-register without exiting.
- If you restart the NN with {{-upgrade}}, then when the DN next heartbeats, it 
will fail the {{verifyRequest()}} check, since the registration ID's namespace 
fields no longer match (the ctime has been incremented by the upgrade). This 
causes the DataNode to exit.
- Of course, restarting the DN at this point makes it take the snapshot and 
participate in the upgrade as expected.

So, to try to respond to Konstantin's questions, here are a couple example 
scenarios:

*Scenario 1*: rolling upgrade without doing a snapshot upgrade (for emergency 
bug fixes, hot fixes, MR fixes, other fixes which we don't expect to affect 
data reliability):

- Leave the NN running, on the old version.
- On each DN, in succession: (1) shutdown DN, (2) upgrade software to the new 
version, (3) start DN

The above is sufficient if the changes are scoped only to DNs. If the change 
also affects the NN, then you will need to add the following step, either at 
the beginning or end of the process:

- shutdown NN. upgrade installed software. start NN on new version

In the case of an HA setup, we can do the NN upgrade without downtime:

- shutdown SBN. upgrade SBN software. start SBN.
- failover to SBN running new version.
- Shutdown previous active. Upgrade software. Start previous active
- Optionally fail back

*Scenario 2*: upgrade to a version with a new layout version (LV)

In this case, a snapshot style upgrade is required -- the NN will not restart 
without the -upgrade flag, and a DN will not connect to a NN with a different 
LV. So the scenario is the same as today:

- Shutdown entire cluster
- Upgrade all software in teh clsuter
- Start cluster with {{-upgrade}} flag
-- any nodes that missed the software upgrade will fail to connect, since their 
LV does not match  (this patch retains that behavior)

*Scenario 3*: upgrade to a version with same layout version, but some data risk 
(for example upgrading to a version with bug fixes pertaining to replication 
policies, corrupt block detection, etc)

In this scenario, the NN does not mandate a {{-upgrade}} flag, but as Sanjay 
mentioned above, it can still be useful for data protection. As with today, if 
the user does not want the extra protection, this scenario can be treated 
identically to scenario 1. If the user does want the protection, it can be 
treated identically to scenario 2. Scenario 2 remains safe because of the check 
against the NameNode's {{ctime}} matching the DN's {{ctime}}. As soon as you 
restart the NN with the {{-upgrade}} flag, all running DNs will exit. Any newly 
started DN will noticethe new namespace ctime and take part in the snapshot 
upgrade.



Does the above description address your concerns? Another idea would be to add 
a new configuration option like {{dfs.allow.rolling.upgrades}} which enables 
the new behavior, so an admin who prefers not to use the feature can disallow 
it completely.


 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, 
 HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3216) DatanodeID should support multiple IP addresses

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250181#comment-13250181
 ] 

Todd Lipcon commented on HDFS-3216:
---

bq. #2 Yes, when reading/writing DatanodeInfos to/from streams (same as before 
when creating a DatanodeID w/o a name)

When do we read/write DatanodeInfo from streams, now that we are pb-ified? i.e 
is the writable interface even used anymore?


{code}
+   * Return the canonical IP address for this DatanodeID. Not all uses
+   * of DatanodeID are multi-IP aware, or would multiple IPs, therefore
+   * we use the first address as the canonical one.
{code}
ENOTASENTENCE


bq. #1 We still need the notion of canonical IP, mostly for cases that don't 
care about multiple IP addresses. Updated the javadoc.

How is it ensured that the canonical IP is kept consistent across DN 
restarts, for example? It's just whichever one is listed first in the DN-side 
configuration?

bq. Fixed the cast, now casts to String and serializes/deserializes the IPs, 
the test does check this (was failing now passes).

That's a little strange, to serialize it into a comma-separated list inside 
JSON. It's not possible to get Jackson to serialize it as a proper JSON array? 
Perhaps using a ListString inside the map?

 DatanodeID should support multiple IP addresses
 ---

 Key: HDFS-3216
 URL: https://issues.apache.org/jira/browse/HDFS-3216
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3216.txt, hdfs-3216.txt


 The DatanodeID has a single field for the IP address, for HDFS-3146 we need 
 to extend it to support multiple addresses.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3055) Implement recovery mode for branch-1

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250194#comment-13250194
 ] 

Todd Lipcon commented on HDFS-3055:
---

OK. +1, patch looks good. Please run all the branch-1 unit tests so we don't 
introduce any other failures - should be OK but best to be safe on the stable 
branch. When you report back, I'll commit.

 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.0.0

 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch, 
 HDFS-3055-b1.003.patch, HDFS-3055-b1.004.patch, HDFS-3055-b1.005.patch, 
 HDFS-3055-b1.006.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250211#comment-13250211
 ] 

Todd Lipcon commented on HDFS-2983:
---

{code}
+if (!dnVersion.equals(nnVersion)) {
+  LOG.info(Reported DataNode version ' + dnVersion + ' does not match  
+
+  NameNode version ' + nnVersion + ' but is within acceptable  +
+  limits. Note: This is normal during a rolling upgrade.);
+}
{code}
Can you also please include the DN IP address in this log message?


- Nice lengthy javadoc on VersionUtil.compareVersions. Can you please add 
something like:
This method of comparison is similar to the method used by package versioning 
systems like deb and RPM

and also maybe give one example of what you mean? eg add For example, Hadoop 
0.3  Hadoop 0.20 even though naive string comparison would consider it larger.

Otherwise, looks great. +1 from my standpoint. Konstantin/Sanjay - can you 
please comment regarding the above discussion? While I agree that there are 
more improvements to be made, I don't think this patch will hurt things. Or, if 
you are nervous about it, can we commit this with a flag to allow rolling 
upgrade if the operator permits it?


 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, 
 HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250235#comment-13250235
 ] 

Todd Lipcon commented on HDFS-3094:
---

Sorry again Arpit - looks like the commit of HDFS-3004 caused another conflict 
here just a couple hours ago...

 add -nonInteractive and -force option to namenode -format command
 -

 Key: HDFS-3094
 URL: https://issues.apache.org/jira/browse/HDFS-3094
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.24.0, 1.0.2
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, 
 HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, 
 HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, 
 HDFS-3094.patch


 Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup 
 the directories in the local file system.
 -force : namenode formats the directories without prompting
 -nonInterActive : namenode format will return with an exit code of 1 if the 
 dir exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3229) add JournalProtocol RPCs to list finalized edit segments, and read edit segment file from JournalNode.

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250251#comment-13250251
 ] 

Todd Lipcon commented on HDFS-3229:
---

bq. However, if we believe we need web UI for JournalNode, we need the port 
anyways.

I think it's a good idea, since we have other endpoints in our default HTTP 
server that are very useful for ops -- for example the /jmx servlet and the 
/conf servlet can both be very handy. I also think exposing a basic web UI is 
helpful to operators who might try to understand the current state of the 
system.

bq. Suppose we used HTTP server to synchronize the lagging JournalNode by 
downloading missed edit logs from another Journal Node. Firstly, the lagging JN 
needs to get (e.g., by asking for NN) a list of JNs with full set of edit logs. 
Then, it downloads the missed logs from a good JN through http, while it could 
accept streamed logs from NN through rpc at the same time. Given the two 
servers are working on different file sets(finalized logs vs in-progress log), 
synchronizing them seems not a concern.

Right - this is the same process that the 2NN uses to synchronize finalized log 
segments from the NN. See SecondaryNameNode.downloadCheckpointFiles for the 
code.

 add JournalProtocol RPCs to list finalized edit segments, and read edit 
 segment file from JournalNode. 
 ---

 Key: HDFS-3229
 URL: https://issues.apache.org/jira/browse/HDFS-3229
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Brandon Li
Assignee: Brandon Li



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3236) NameNode does not initialize generic conf keys when started with -initializeSharedEditsDir

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250312#comment-13250312
 ] 

Todd Lipcon commented on HDFS-3236:
---

+1 pending jenkins

 NameNode does not initialize generic conf keys when started with 
 -initializeSharedEditsDir
 --

 Key: HDFS-3236
 URL: https://issues.apache.org/jira/browse/HDFS-3236
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, name-node
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Attachments: HDFS-3236.patch


 This means that configurations that scope the location of the 
 name/edits/shared edits dirs by nameserice or namenode won't work with `hdfs 
 namenode -initializeSharedEdits'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3238) ServerCommand and friends don't need to be writables

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250359#comment-13250359
 ] 

Todd Lipcon commented on HDFS-3238:
---

+1 pending jenkins results

 ServerCommand and friends don't need to be writables
 

 Key: HDFS-3238
 URL: https://issues.apache.org/jira/browse/HDFS-3238
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3238.txt


 We can remove writable infrastructure from the ServerCommand classes as 
 they're not uses across clients and we're PB within the server side. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250366#comment-13250366
 ] 

Todd Lipcon commented on HDFS-2983:
---

bq. The scenario that scares me is if somebody does a snapshot, then several 
rolling upgrades, and then decides to rollback. This may be possible, but seems 
to be very much error-prone.

Why is this scenario different than if somebody does a snapshot, then several 
_non-rolling_ upgrades, then decides to rollback? In both cases, we have the 
case of a newer version trying to do a rollback to an older version snapshot. 
Right?

 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, 
 HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3222) DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.

2012-04-09 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250369#comment-13250369
 ] 

Todd Lipcon commented on HDFS-3222:
---

Sounds good to me.

 DFSInputStream#openInfo should not silently get the length as 0 when 
 locations length is zero for last partial block.
 -

 Key: HDFS-3222
 URL: https://issues.apache.org/jira/browse/HDFS-3222
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 1.0.3, 2.0.0, 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-3222-Test.patch


 I have seen one situation with Hbase cluster.
 Scenario is as follows:
 1)1.5 blocks has been written and synced.
 2)Suddenly cluster has been restarted.
 Reader opened the file and trying to get the length., By this time partial 
 block contained DNs are not reported to NN. So, locations for this partial 
 block would be 0. In this case, DFSInputStream assumes that, 1 block size as 
 final size.
 But reader also assuming that, 1 block size is the final length and setting 
 his end marker. Finally reader ending up reading only partial data. Due to 
 this, HMaster could not replay the complete edits. 
 Actually this happend with 20 version. Looking at the code, same should 
 present in trunk as well.
 {code}
 int replicaNotFoundCount = locatedblock.getLocations().length;
 
 for(DatanodeInfo datanode : locatedblock.getLocations()) {
 ..
 ..
  // Namenode told us about these locations, but none know about the replica
 // means that we hit the race between pipeline creation start and end.
 // we require all 3 because some other exception could have happened
 // on a DN that has it.  we want to report that error
 if (replicaNotFoundCount == 0) {
   return 0;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3146) Datanode should be able to register multiple network interfaces

2012-04-08 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249638#comment-13249638
 ] 

Todd Lipcon commented on HDFS-3146:
---

{code}
+  public static InetSocketAddress[] getInterfaceAddrs(
+  String interfaceNames[], int port) throws UnknownHostException {
{code}

Sorry I missed this in the earlier review of this function, but I think it 
would be better to call the parameter something like {{interfaceSpecs}} -- 
because each one may specify an interface name, an IP address, or a subnet.

In the same function, for the subnet case, you're using port 0 instead of the 
specified port. Looks like a mistake?



{code}
+  LOG.warn(Invalid address given  + addrString);
{code}
Nit: add a ':' to the log message


{code}
+// If the datanode registered with an address we can't use
+// then use the address the IPC came in on instead
+if (NetUtils.isWildcardOrLoopback(nodeReg.getIpAddr())) {
{code}

I found this comment a little unclear. Under what circumstance would the DN 
pass a loopback or wildcard IP? Aren't they filtered on the DN side? I think 
this should be at least a WARN, or maybe even throw an exception to disallow 
the registration.

Edit: I got to the part later in the patch where the DN potentially sends a 
wildcard to the NN. I think it might be simplier to have the DN send an empty 
list to the NN if it's bound to wildcard -- and adjust the comment here to 
explain why it would be registering with no addresses.



{code}
+  // TODO: haven't determined the port yet, using default
{code}
Are you planning another patch to fix this on the branch before merging? What's 
the backward-compatibility path with the existing configurations for bind 
address, etc, where the port's specified? We should be clear about which takes 
precedence, and throw errors on startup if both are configured, I think?

Maybe it makes sense to change these to just be InetAddress instead of 
InetSocketAddress, and never fill in a port there?

This patch should add the new config to hdfs-default, and edit the existing 
config's documentation to explain how the two interact.


{code}
+  if (0 != interfaceStrs.length) {
+LOG.info(Using interfaces [ +
+Joiner.on(',').join(interfaceStrs)+ ] with addresses [ +
+Joiner.on(',').join(interfaceAddrs) + ]);
+  }
{code}

- need indentation for the joiner lines
- add comment explaining how this eventually gets filled in, if it's empty?


{code}
+   * @param addrs socket addresses to convert
+   * @return an array of strings of IPs for the given addresses
+   */
+  public static String[] toIpAddrStrings(InetSocketAddress[] addrs) {
{code}

javadoc should specify that ports aren't included in the stringification of 
addresses

 Datanode should be able to register multiple network interfaces
 ---

 Key: HDFS-3146
 URL: https://issues.apache.org/jira/browse/HDFS-3146
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3146.txt


 The Datanode should register multiple interfaces with the Namenode (who then 
 forwards them to clients). We can do this by extending the DatanodeID, which 
 currently just contains a single interface, to contain a list of interfaces. 
 For compatibility, the DatanodeID method to get the DN address for data 
 transfer should remain unchanged (multiple interfaces are only used where the 
 client explicitly takes advantage of them).
 By default, if the Datanode binds on all interfaces (via using the wildcard 
 in the dfs*address configuration) all interfaces are exposed, modulo ones 
 like the loopback that should never be exposed. Alternatively, a new 
 configuration parameter ({{dfs.datanode.available.interfaces}}) allows the 
 set of interfaces can be specified explicitly in case the user only wants to 
 expose a subset. If the new default behavior is too disruptive we could 
 default dfs.datanode.available.interfaces to be the IP of the IPC interface 
 which is the only interface exposed today (per HADOOP-6867, only the port 
 from dfs.datanode.address is used today). 
 The interfaces can be specified by name (eg eth0), subinterface name (eg 
 eth0:0), or IP address. The IP address can be specified by range using CIDR 
 notation so the configuration values are portable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3216) DatanodeID should support multiple IP addresses

2012-04-08 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249640#comment-13249640
 ] 

Todd Lipcon commented on HDFS-3216:
---

Should we deprecate this function? Or do we need some concept of the 
canonical/main IP address? If the latter, we should explain this in the 
javadoc of this function.

{code}
   public String getIpAddr() {
   -return ipAddr;
   +return ipAddrs[0];
   +  }
{code}



- is it ever valid to construct a DatanodeID with no IP addresses? If not we 
should add a Preconditions check or at least an assert on the length of the 
ipAddrs array in the constructor and the setter



{code}
+return new DatanodeID(ipAddrs.toArray(new String[ipAddrs.size()]) , 
dn.getHostName(), dn.getStorageID(),
 dn.getXferPort(), dn.getInfoPort(), dn.getIpcPort());
{code}
Can you re-wrap this to 80chars?



- Is the code change in JsonUtil covered by TestJsonUtil? (are you sure that 
the cast to String[] is right?)

- in some of the tests, it's filling in hostnames instead of IPs for the 
ipAddrs field. Is that right, or do we expect that it will always be resolved 
IPs? The dual nature makes me nervous.


 DatanodeID should support multiple IP addresses
 ---

 Key: HDFS-3216
 URL: https://issues.apache.org/jira/browse/HDFS-3216
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3216.txt


 The DatanodeID has a single field for the IP address, for HDFS-3146 we need 
 to extend it to support multiple addresses.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3218) Use multiple remote DN interfaces for block transfer

2012-04-08 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249641#comment-13249641
 ] 

Todd Lipcon commented on HDFS-3218:
---

- I think it would make sense to add a utility function like 
{{DFSUtil.getRandomXferAddress(DatanodeID)}}, since you have a lot of 
repetition fo the {{DFSUtil.getRandom().nextInt}} stuff. Or even make it a 
member function of the DatanodeID?

Otherwise looks good.


 Use multiple remote DN interfaces for block transfer
 

 Key: HDFS-3218
 URL: https://issues.apache.org/jira/browse/HDFS-3218
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs client
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3218.txt


 HDFS-3146 and HDFS-3216 expose multiple DN interfaces to the client. In order 
 for clients, in aggregate, to use multiple DN interfaces clients should pick 
 different interfaces when transferring blocks. Given that we cache client - 
 DN connections the policy of picking a remote interface at random for each 
 new connection seems best (vs round robin for example). In the future we 
 could make the client congestion aware. We could also establish multiple 
 connections between the client and DN and therefore use multiple interfaces 
 for a single block transfer. Both of those are out of scope for this jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-08 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249646#comment-13249646
 ] 

Todd Lipcon commented on HDFS-2983:
---

bq. Technically the quote characters inside the Javadoc should be  - or you 
could just use single-quotes instead to avoid the hassle.

erg, JIRA went and formatted my explanation :) The quote characters should be 
{{ quot;}} without the space.

 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-08 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249645#comment-13249645
 ] 

Todd Lipcon commented on HDFS-2983:
---

- Can VersionUtil be made abstract, since it only has static methods?



{code}
+   * This function splits the two versions on . and performs a lexical
+   * comparison of the resulting components.
{code}

Technically the quote characters inside the Javadoc should be quot; - or you 
could just use single-quotes instead to avoid the hassle.



VersionUtil should be doing numeric comparison rather than straight string 
comparison. For example 10.0.0 should be considered greater than 2.0, but I 
think the current implementation doesn't implement this correctly.

Please add a test for this case to TestVersionUtil as well.


{code}
+  private static void assertExpectedValues(String lower, String higher) {
+assertTrue(0  VersionUtil.compareVersions(lower, higher));
+assertTrue(0  VersionUtil.compareVersions(higher, lower));
+  }
{code}
These comparisons read backwards to me. ie should be:
{code}
+  private static void assertExpectedValues(String lower, String higher) {
+assertTrue(VersionUtil.compareVersions(lower, higher)  0);
+assertTrue(VersionUtil.compareVersions(higher, lower)  0);
+  }
{code}
don't you think?



{code}
+if (VersionUtil.compareVersions(dnVersion, minimumDataNodeVersion)  0) {
+  IncorrectVersionException ive = new IncorrectVersionException(
+  minimumDataNodeVersion, dnVersion, DataNode, NameNode);
+  LOG.warn(ive.getMessage());
+  throw ive;
+}
{code}

Here, does the log message end up including the remote IP address somehow? If 
not, I think we should improve it to include that (and maybe the stringified 
DatanodeRegistration object)


 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249181#comment-13249181
 ] 

Todd Lipcon commented on HDFS-3192:
---

The state diagram is included in the design doc attached to HDFS-2185. Please 
comment with an example scenario in which you think there is an incorrect 
behavior - I don't know of any aside from HADOOP-8217, but if you know of some 
I'd be really happy to address them rather than find out about them from a 
broken customer :)

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249182#comment-13249182
 ] 

Todd Lipcon commented on HDFS-2983:
---

bq. The proposal seems to suggest that the NN does not need to be updated if 
desired. Correct?

Yes, I think that's correct, and desired. Sometimes upgrades only address the 
slave nodes, so there's no sense having to change the NN. Of course, with HA, 
upgrading the NN isn't as big a problem, but even so it is a more 
complicated/delicate operation.

bq. I see why it is desirable but does can we simplify things or make upgrades 
safer if we drop that requirement?

I don't know if it makes things much simpler. I think adding a requirement that 
the NN upgrade before the DNs is quite inconvenient for operators. But I am not 
100% sure of this, and willing to be convinced :)

 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3229) add JournalProtocol RPCs to list finalized edit segments, and read edit segment file from JournalNode.

2012-04-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249183#comment-13249183
 ] 

Todd Lipcon commented on HDFS-3229:
---

I'd recommend reusing the code/protobufs for the existing getEditLogManifest() 
calls that the 2NN uses to transfer logs, here.

 add JournalProtocol RPCs to list finalized edit segments, and read edit 
 segment file from JournalNode. 
 ---

 Key: HDFS-3229
 URL: https://issues.apache.org/jira/browse/HDFS-3229
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Brandon Li
Assignee: Brandon Li



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-06 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248587#comment-13248587
 ] 

Todd Lipcon commented on HDFS-3212:
---

bq. Todd, if you are referring to creating a edit log with the name format 
edit_log_epoch_numin_progress or when finalized 
edit_logepoch_numberstart_txidend_txid, it is a better solution that 
creating a seperate metadata file.

Sure, that works too. Except you'll have to change a ton of FileJournalManager 
code paths to do this...

bq. Otherwise, Suresh's solution in adding the epoch number in start log 
segment sounds good.

I still think that's really wrong, because transaction _data_ is separate from 
transaction _storage_. Epoch numbers are a storage layer thing.

bq. Actually, for debugging purposes, we should add more information such as 
time when the journal was started, NN id of owner etc along with epoch number

I agree with all of the above, except for the epoch number. The timestamp, NN 
id, hostname, etc, are all NN-layer things, whereas the epoch number is an 
edits storage layer thing.

 Persist the epoch received by the JournalService
 

 Key: HDFS-3212
 URL: https://issues.apache.org/jira/browse/HDFS-3212
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: Shared journals (HDFS-3092)
Reporter: Suresh Srinivas

 epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3217) ZKFC should restart NN when healthmonitor gets a SERVICE_NOT_RESPONDING exception

2012-04-06 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248589#comment-13248589
 ] 

Todd Lipcon commented on HDFS-3217:
---

I disagree. It is an explicit decision to not have the ZKFC act as a service 
supervisor, because it adds a lot of complexity. There already exist lots of 
solutions for service management - we assume that the user is already using 
something like puppet, daemontools, supervisord, cron, etc, to make sure the 
daemon restarts eventually.

 ZKFC should restart NN when healthmonitor gets a SERVICE_NOT_RESPONDING 
 exception
 -

 Key: HDFS-3217
 URL: https://issues.apache.org/jira/browse/HDFS-3217
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: auto-failover, ha
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-06 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248595#comment-13248595
 ] 

Todd Lipcon commented on HDFS-2983:
---

Here's another proposal which I think makes sense:

1) Ensure that version compatibility is checked both on the NN side and the DN 
side. So, when the DN first connects to the NN, the NN verifies the DN's 
version. If it is deemed incompatible, it is rejected. Then, then DN verifies 
the NN version in the response. If it is deemed incompatible, it does not 
proceed with registration.

2) Add a function to compare two version numbers in the straightforward manner: 
split the numbers on ., then componentwise, do comparisons according to 
string numerical value (like sort -n). Some examples: 2.0.1  2.0.0. 10.0  
2.0.0. 2.0.0a  2.0.0. 2.0.0b  2.0.0.a.  (this is the comparison mechanism 
package managers tend to use)

3) In hdfs-default.xml, add a configuration like 
{{cluster.min.supported.version}}. In branch-2, we set this to 2.0.0. So, by 
default, any 2.x.x can talk to any other 2.x.x. When we release 3.x.x, if it is 
incompatible with 2.x.x, then we just need to bump that config in 3.0's 
hdfs-default.xml.

This supports the following use cases/requirements:
- rolling upgrade can be done for most users without having to change any 
configs.
- new versions of Hadoop can be marked incompatible with old versions of Hadoop
- cluster admins can still override it if they want to disallow older nodes 
from connecting. For example, imagine there is a critical security bug fixed in 
2.0.0a - the admin can set the config to 2.0.0.a, and then 2.0.0 nodes may no 
longer join the cluster (even though they are protocol-wise compatible)

Thoughts?

 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3222) DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.

2012-04-06 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248667#comment-13248667
 ] 

Todd Lipcon commented on HDFS-3222:
---

Nice catch, Uma. I think we can use the length field of the block in the NN 
metadata to solve this, right? The first hflush()/sync() call from the client 
will cause persistBlocks() to be called, which should write down the block with 
a non-zero length. Then on restart, we can use this length instead of 0 when 
the replicas aren't found.

 DFSInputStream#openInfo should not silently get the length as 0 when 
 locations length is zero for last partial block.
 -

 Key: HDFS-3222
 URL: https://issues.apache.org/jira/browse/HDFS-3222
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 1.0.3, 2.0.0, 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G

 I have seen one situation with Hbase cluster.
 Scenario is as follows:
 1)1.5 blocks has been written and synced.
 2)Suddenly cluster has been restarted.
 Reader opened the file and trying to get the length., By this time partial 
 block contained DNs are not reported to NN. So, locations for this partial 
 block would be 0. In this case, DFSInputStream assumes that, 1 block size as 
 final size.
 But reader also assuming that, 1 block size is the final length and setting 
 his end marker. Finally reader ending up reading only partial data. Due to 
 this, HMaster could not replay the complete edits. 
 Actually this happend with 20 version. Looking at the code, same should 
 present in trunk as well.
 {code}
 int replicaNotFoundCount = locatedblock.getLocations().length;
 
 for(DatanodeInfo datanode : locatedblock.getLocations()) {
 ..
 ..
  // Namenode told us about these locations, but none know about the replica
 // means that we hit the race between pipeline creation start and end.
 // we require all 3 because some other exception could have happened
 // on a DN that has it.  we want to report that error
 if (replicaNotFoundCount == 0) {
   return 0;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-06 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248716#comment-13248716
 ] 

Todd Lipcon commented on HDFS-2983:
---

Hey Sanjay. I agree that the snapshot-on-upgrade feature is really important, 
and I don't think this work precludes/breaks that. Here's my line of thinking:
- even with the ability to do rolling upgrade, there is no restriction that you 
_must_ do upgrades like this. So, you could still decide to use the current 
upgrade process as a policy decision.
- As you mentioned, many upgrades/hotfixes/EBFs don't touch core code, so for 
those, most people would prefer a rolling upgrade without downtime.
- separately, after this is committed, we can work on figuring out a strategy 
that allows you to do an upgrade-style snapshot before starting the rolling 
upgrade. It looks like you just filed HDFS-3225 for this, so let's continue 
this discussion there.

Agree?

 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3203) Currently,The Checkpointer is controled by time.in this way,it must be checkponit in that it is only one transaction in checkpoint period.I think it need add file size t

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247068#comment-13247068
 ] 

Todd Lipcon commented on HDFS-3203:
---

You can configure dfs.namenode.checkpoint.txns to the desired number of 
transactoins, and then set dfs.namenode.checkpoint.period to a very high value. 
This will give you the desired behavior. Does that not satisfy your 
requirements?

 Currently,The Checkpointer is controled by time.in this way,it must be 
 checkponit in that it is only one transaction in checkpoint period.I think it 
 need add file size to control  checkpoint
 --

 Key: HDFS-3203
 URL: https://issues.apache.org/jira/browse/HDFS-3203
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.24.0, 2.0.0
Reporter: liaowenrui



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247071#comment-13247071
 ] 

Todd Lipcon commented on HDFS-3150:
---

Sorry, I should have said +1 assuming these changes are addressed in my above 
comment. Since Eli addressed my comments, here's my official +1 for the patch.

 Add option for clients to contact DNs via hostname in branch-1
 --

 Key: HDFS-3150
 URL: https://issues.apache.org/jira/browse/HDFS-3150
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, hdfs client
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 1.1.0

 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt


 Per the document attached to HADOOP-8198, this is just for branch-1, and 
 unbreaks DN multihoming. The datanode can be configured to listen on a bond, 
 or all interfaces by specifying the wildcard in the dfs.datanode.*.address 
 configuration options, however per HADOOP-6867 only the source address of the 
 registration is exposed to clients. HADOOP-985 made clients access datanodes 
 by IP primarily to avoid the latency of a DNS lookup, this had the side 
 effect of breaking DN multihoming. In order to fix it let's add back the 
 option for Datanodes to be accessed by hostname. This can be done by:
 # Modifying the primary field of the Datanode descriptor to be the hostname, 
 or 
 # Modifying Client/Datanode - Datanode access use the hostname field 
 instead of the IP
 I'd like to go with approach #2 as it does not require making an incompatible 
 change to the client protocol, and is much less invasive. It minimizes the 
 scope of modification to just places where clients and Datanodes connect, vs 
 changing all uses of Datanode identifiers.
 New client and Datanode configuration options are introduced:
 - {{dfs.client.use.datanode.hostname}} indicates all client to datanode 
 connections should use the datanode hostname (as clients outside cluster may 
 not be able to route the IP)
 - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should 
 use hostnames when connecting to other Datanodes for data transfer
 If the configuration options are not used, there is no change in the current 
 behavior.
 I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the 
 use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) 
 based on the context the ID is being used in, vs always using the IP:xferPort 
 as the Datanode's name, and using the name everywhere.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3204) Minor modification to JournalProtocol.proto to make it generic

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247431#comment-13247431
 ] 

Todd Lipcon commented on HDFS-3204:
---

A few small typos:

+  optional uint32 namespceID = 3;// Namespace ID
and here:
+// convertion happens for messages from Namenode to Journal receivers.

otherwise seems good modulo investigating TestBackupNode failure

 Minor modification to JournalProtocol.proto to make it generic
 --

 Key: HDFS-3204
 URL: https://issues.apache.org/jira/browse/HDFS-3204
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
 Attachments: HDFS-3204.txt


 JournalProtocol.proto uses NamenodeRegistration in methods such as journal() 
 for identifying the source. I want to make it generic so that the method can 
 be called with journal information to identify the journal. I plan to use the 
 protocol also for sync purposes, where the source of the journal can be some 
 thing other than namenode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3203) Currently,The Checkpointer is controled by time.in this way,it must be checkponit in that it is only one transaction in checkpoint period.I think it need add file size t

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247441#comment-13247441
 ] 

Todd Lipcon commented on HDFS-3203:
---

I'm sorry, I don't understand your question. Can you please clarify?

 Currently,The Checkpointer is controled by time.in this way,it must be 
 checkponit in that it is only one transaction in checkpoint period.I think it 
 need add file size to control  checkpoint
 --

 Key: HDFS-3203
 URL: https://issues.apache.org/jira/browse/HDFS-3203
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.24.0, 2.0.0
Reporter: liaowenrui



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3161) 20 Append: Excluded DN replica from recovery should be removed from DN.

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247446#comment-13247446
 ] 

Todd Lipcon commented on HDFS-3161:
---

If this is 0.20-append specific, we've recently decided to disable the append() 
call in that branch (and only support sync()). So, I don't think the 
append-related scenario is worth worrying about (assuming it works correctly in 
the trunk implementation)

 20 Append: Excluded DN replica from recovery should be removed from DN.
 ---

 Key: HDFS-3161
 URL: https://issues.apache.org/jira/browse/HDFS-3161
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: suja s
Priority: Critical
 Fix For: 1.0.3


 1) DN1-DN2-DN3 are in pipeline.
 2) Client killed abruptly
 3) one DN has restarted , say DN3
 4) In DN3 info.wasRecoveredOnStartup() will be true
 5) NN recovery triggered, DN3 skipped from recovery due to above check.
 6) Now DN1, DN2 has blocks with generataion stamp 2 and DN3 has older 
 generation stamp say 1 and also DN3 still has this block entry in 
 ongoingCreates
 7) as part of recovery file has closed and got only two live replicas ( from 
 DN1 and DN2)
 8) So, NN issued the command for replication. Now DN3 also has the replica 
 with newer generation stamp.
 9) Now DN3 contains 2 replicas on disk. and one entry in ongoing creates with 
 referring to blocksBeingWritten directory.
 When we call append/ leaseRecovery, it may again skip this node for that 
 recovery as blockId entry still presents in ongoingCreates with startup 
 recovery true.
 It may keep continue this dance for evry recovery.
 And this stale replica will not be cleaned untill we restart the cluster. 
 Actual replica will be trasferred to this node only through replication 
 process.
 Also unnecessarily that replicated blocks will get invalidated after next 
 recoveries

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247472#comment-13247472
 ] 

Todd Lipcon commented on HDFS-2983:
---

Maybe I'm misunderstanding, but I thought the plan for this JIRA was to add a 
more structured version number with major/minor/patch components. Then have the 
check still verify that the major/minor match up, but not verify the patch 
level and svn revision? That is to say, we should loosen the restriction but 
not entirely drop it.

 Relax the build version check to permit rolling upgrades within a release
 -

 Key: HDFS-2983
 URL: https://issues.apache.org/jira/browse/HDFS-2983
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Aaron T. Myers
 Attachments: HDFS-2983.patch


 Currently the version check for DN/NN communication is strict (it checks the 
 exact svn revision or git hash, Storage#getBuildVersion calls 
 VersionInfo#getRevision), which prevents rolling upgrades across any 
 releases. Once we have the PB-base RPC in place (coming soon to branch-23) 
 we'll have the necessary pieces in place to loosen this restriction, though 
 perhaps it takes another 23 minor release or so before we're ready to commit 
 to making the minor versions compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3211) JournalProtocol changes required for introducing epoch and fencing

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247790#comment-13247790
 ] 

Todd Lipcon commented on HDFS-3211:
---

Hi Suresh. Have you looked at HDFS-3189? Hopefully we can make our protocols 
similar with the intent of eventually merging the two implementations.

 JournalProtocol changes required for introducing epoch and fencing
 --

 Key: HDFS-3211
 URL: https://issues.apache.org/jira/browse/HDFS-3211
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: Shared journals (HDFS-3092)
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 JournalProtocol changes to introduce epoch in every request. Adding new 
 method fence for fencing a JournalService. On BackupNode fence is a no-op. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3211) JournalProtocol changes required for introducing epoch and fencing

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247800#comment-13247800
 ] 

Todd Lipcon commented on HDFS-3211:
---

Hi Suresh. You need to store the epoch persistently on disk to handle the case 
of journal daemon restarts, I think. HDFS-3190 does a refactor to add a utility 
class you can use for this.

 JournalProtocol changes required for introducing epoch and fencing
 --

 Key: HDFS-3211
 URL: https://issues.apache.org/jira/browse/HDFS-3211
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: Shared journals (HDFS-3092)
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFS-3211.txt, HDFS-3211.txt


 JournalProtocol changes to introduce epoch in every request. Adding new 
 method fence for fencing a JournalService. On BackupNode fence is a no-op. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3110) libhdfs implementation of direct read API

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247805#comment-13247805
 ] 

Todd Lipcon commented on HDFS-3110:
---

Hey Henry. The patch looks good, but I can't figure out how to run the test. 
When I do mvn -Pnative -DskipTests install, it builds libhdfs, but doesn't 
build the hdfs_test binary. Can you post instructions on how to run the test 
manually? Then we can do another jira to make it more automatic.

 libhdfs implementation of direct read API
 -

 Key: HDFS-3110
 URL: https://issues.apache.org/jira/browse/HDFS-3110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: libhdfs
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 0.24.0

 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, 
 HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch


 Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, 
 which leads to significant performance increases when reading local data from 
 C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247847#comment-13247847
 ] 

Todd Lipcon commented on HDFS-3212:
---

I don't think it's reasonable to put the epoch number inside the START 
transaction, because that leaks the idea of epochs out of the journal manager 
layer into the NN layer.

Also, if the JN restarts, when it comes up, how do you make sure that an old NN 
doesn't come back to life with a startLogSegment transaction?

I think you need to record the epoch number separately from the idea of 
segments, for fencing purposes, since you aren't always guaranteed to be in the 
middle of a segment, and you don't want disagreement about who gets to call 
startLogSegment.

 Persist the epoch received by the JournalService
 

 Key: HDFS-3212
 URL: https://issues.apache.org/jira/browse/HDFS-3212
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: Shared journals (HDFS-3092)
Reporter: Suresh Srinivas

 epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3213) JournalDaemon (server) should persist the cluster id and nsid in the storage directory

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247849#comment-13247849
 ] 

Todd Lipcon commented on HDFS-3213:
---

I'm assuming you'll use StorageDirectory here, which will take care of this all 
for you, right?

 JournalDaemon (server) should persist the cluster id and nsid in the storage 
 directory
 --

 Key: HDFS-3213
 URL: https://issues.apache.org/jira/browse/HDFS-3213
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247871#comment-13247871
 ] 

Todd Lipcon commented on HDFS-3212:
---

bq. Is it the case that JN will reject it since the old NN has a smaller epoch?

Right -- that's why it needs to persist, IMO.

bq. 2. might be less optimal because now it consists of 2 operations. 1) 
rolling the log and creating a new segment 2) updating a metadata file.

I think it's just a matter of getting the ordering right. Before starting a log 
segment, you need to fence prior writers. The fencing step is what writes down 
the epoch. Then, when you create a new log segment, you tag it (eg by storing 
it in a directory per-epoch, or by writing a metadata file next to it before 
you create the file). I think this is sufficiently atomic.

bq. So 2 edit logs with same txid but can be differentiated using epochs

I've had another idea which I want to write up in the design doc. But, 
basically, I think we can solve this problem more simply by the following:
- Currently, when FSEditLog starts a new segment, it calls 
journal.startLogSegment(), then journal.logEdit(StartLogSegmentOp), then 
journal.logSync(). So there is a point of time when the log segment is empty, 
with no transactions. If instead, we changed it so that the startLogSegment() 
call was responsible for writing the first transaction (and only the first), 
atomically, then we might not have a problem. We just have to make the 
restriction that the first transaction of any segment is always deterministic 
(eg just START_LOG_SEGMENT(txid) and nothing else).

Let me revise the design doc in HDFS-3077 with this idea to see if it works 
when fully fleshed out.


 Persist the epoch received by the JournalService
 

 Key: HDFS-3212
 URL: https://issues.apache.org/jira/browse/HDFS-3212
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: Shared journals (HDFS-3092)
Reporter: Suresh Srinivas

 epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247894#comment-13247894
 ] 

Todd Lipcon commented on HDFS-3212:
---

bq. I do not understand what you mean by NN layer. Epoch is a notion from 
JournalManager to the JournalNode. Both need to understand this and provide 
appropriate guarantees.

Currently, the NN code when starting a new log segment looks like this:
{code}
  editLogStream = journalSet.startLogSegment(segmentTxId);
...
if (writeHeaderTxn) {
  logEdit(LogSegmentOp.getInstance(
  FSEditLogOpCodes.OP_START_LOG_SEGMENT));
  logSync();
}
{code}

So the operation of starting a segment, and writing the OP_START_LOG_SEGMENT 
transaction are separate. In general, the JournalManager abstraction doesn't 
know about the contents of the edits it's writing -- it's just responsible for 
bytes. If you wanted to include the epoch number in the OP_START_LOG_SEGMENT 
transaction, you'd have to have the NN code do something like 
{{journalManager.getCurrentEpoch()}}, and then feed that into the logEdit call. 
But that's not very generic, so it seems like a leak of abstraction.

bq. Whether you store it in a directory per-epoch or record it in the 
startlogSegment record at the beginning of the segment - they are essentially 
the same.

I agree, if you're talking about prefixing it at the beginning of the file, 
before the first transaction. But, if you're talking about actually putting it 
in the content of the first transaction, I think it's a bad idea for the reason 
above. My preference is to keep it separated from the file, so that the files 
written by JournalDaemon are exactly identical to the files that would be 
written by FileJournalManager. That allows you to copy to and from the 
different types of nodes without any difference in format.

 Persist the epoch received by the JournalService
 

 Key: HDFS-3212
 URL: https://issues.apache.org/jira/browse/HDFS-3212
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: Shared journals (HDFS-3092)
Reporter: Suresh Srinivas

 epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3203) Currently,The Checkpointer is controled by time.in this way,it must be checkponit in that it is only one transaction in checkpoint period.I think it need add file size t

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248023#comment-13248023
 ] 

Todd Lipcon commented on HDFS-3203:
---

bq. 1.dfs.namenode.checkpoint.txns is not used in ha

It is supposed to be used. See this code:
{code}
  if (uncheckpointed = checkpointConf.getTxnCount()) {
LOG.info(Triggering checkpoint because there have been  + 
uncheckpointed +  txns since the last checkpoint, which  +
exceeds the configured threshold  +
checkpointConf.getTxnCount());
needCheckpoint = true;
  } else if (secsSinceLast = checkpointConf.getPeriod()) {
LOG.info(Triggering checkpoint because it has been  +
secsSinceLast +  seconds since the last checkpoint, which  +
exceeds the configured interval  + 
checkpointConf.getPeriod());
needCheckpoint = true;
  }
{code}
If it is not working, please explain how to reproduce.

bq. 2.why standbyCheckpointer is running in active namenode and standby 
namenode?

The daemon only runs when the node is in standby mode. When it becomes active, 
it stops the checkpointer.


 Currently,The Checkpointer is controled by time.in this way,it must be 
 checkponit in that it is only one transaction in checkpoint period.I think it 
 need add file size to control  checkpoint
 --

 Key: HDFS-3203
 URL: https://issues.apache.org/jira/browse/HDFS-3203
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.24.0, 2.0.0
Reporter: liaowenrui



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3178) Add states for journal synchronization in journal daemon

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248057#comment-13248057
 ] 

Todd Lipcon commented on HDFS-3178:
---

Hey folks. I noticed there's a branch for HDFS-3092, but this got committed to 
trunk. Was that on purpose?

 Add states for journal synchronization in journal daemon
 

 Key: HDFS-3178
 URL: https://issues.apache.org/jira/browse/HDFS-3178
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 3.0.0

 Attachments: h3178_20120403_svn_mv.patch, h3178_20120404.patch, 
 h3178_20120404_svn_mv.patch, h3178_20120404b_svn_mv.patch, 
 h3178_20120405.patch, h3178_20120405_svn_mv.patch, svn_mv.sh


 Journal in a new daemon has to be synchronized to the current transaction.  
 It requires new states such as WaitingForRoll, Syncing and Synced.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3178) Add states for journal synchronization in journal daemon

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248082#comment-13248082
 ] 

Todd Lipcon commented on HDFS-3178:
---

No concern, just thought it might have been a mistake. Carry on :)

 Add states for journal synchronization in journal daemon
 

 Key: HDFS-3178
 URL: https://issues.apache.org/jira/browse/HDFS-3178
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 3.0.0

 Attachments: h3178_20120403_svn_mv.patch, h3178_20120404.patch, 
 h3178_20120404_svn_mv.patch, h3178_20120404b_svn_mv.patch, 
 h3178_20120405.patch, h3178_20120405_svn_mv.patch, svn_mv.sh


 Journal in a new daemon has to be synchronized to the current transaction.  
 It requires new states such as WaitingForRoll, Syncing and Synced.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3084) FenceMethod.tryFence() and ShellCommandFencer should pass namenodeId as well as host:port

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246457#comment-13246457
 ] 

Todd Lipcon commented on HDFS-3084:
---

No, the scripts and failover controllers use keytab-based or straight user 
credentials.

 FenceMethod.tryFence() and ShellCommandFencer should pass namenodeId as well 
 as host:port
 -

 Key: HDFS-3084
 URL: https://issues.apache.org/jira/browse/HDFS-3084
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha
Affects Versions: 0.24.0, 0.23.3
Reporter: Philip Zeyliger
Assignee: Todd Lipcon
 Attachments: hdfs-3084.txt


 The FenceMethod interface passes along the host:port of the NN that needs to 
 be fenced.  That's great for the common case.  However, it's likely necessary 
 to have extra configuration parameters for fencing, and these are typically 
 keyed off the nameserviceId.namenodeId (if, for nothing else, consistency 
 with all the other parameters that are keyed off of namespaceId.namenodeId).  
 Obviously this can be backed out from the host:port, but it's inconvenient, 
 and requires iterating through all the configs.
 The shell interface exhibits the same issue: host:port is great for most 
 fencers, but if you need extra configs (like the host:port of the power 
 supply unit), those are harder to pipe through without the namenodeId.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3168) Clean up FSNamesystem and BlockManager

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246600#comment-13246600
 ] 

Todd Lipcon commented on HDFS-3168:
---

bq. Aaron, it is nothing to do with it. Any contributor could review code. It 
was a merging problem.

Is that the case? I have no opinion on this particular patch and whether a 
different reviewer might have seen the issue. But I thought you had to get a 
committer +1 to commit things...

 Clean up FSNamesystem and BlockManager
 --

 Key: HDFS-3168
 URL: https://issues.apache.org/jira/browse/HDFS-3168
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.24.0, 0.23.3

 Attachments: h3168_20120330.patch, h3168_20120402.patch, 
 h3168_20120403.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3168) Clean up FSNamesystem and BlockManager

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246651#comment-13246651
 ] 

Todd Lipcon commented on HDFS-3168:
---

By my understanding of our policies, the committer who provides the +1 has to 
be someone separate than the patch author. On branches I'm fine being lax here, 
since we need three +1s to merge a branch, but on trunk, I think it merits a 
discussion if there is disagreement on what our policies are.

 Clean up FSNamesystem and BlockManager
 --

 Key: HDFS-3168
 URL: https://issues.apache.org/jira/browse/HDFS-3168
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.24.0, 0.23.3

 Attachments: h3168_20120330.patch, h3168_20120402.patch, 
 h3168_20120403.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246671#comment-13246671
 ] 

Todd Lipcon commented on HDFS-3192:
---

Why add multiple stonith paths, given we need external stonith anyway? It just 
adds to the complexity by increasing the number of scenarios we have to debug, 
etc.

That is to say: if the ZKFC dies, then it will lose its lock, and the other 
node will stonith this one when it takes over. What's the benefit of having it 
abort itself at the same time? In fact, it seems to be detrimental, because if 
it stays up, the other node can do a graceful transitionToStandby() call rather 
than having to do something more drastic like a full abort.

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246718#comment-13246718
 ] 

Todd Lipcon commented on HDFS-3192:
---

bq. I thought we are not going to have external stonith using special devices 
and that is mainly the reason why we are going through hoops to implement 
fencing in journal daemons.

In the current design, which uses a filer, we *require* external stonith 
devices. There is no correct way of doing it without either stonith or storage 
fencing.

The proposal with the journal-daemon based fencing is essentailly the same as 
storage fencing - just that we do it with our own software storage instead of a 
NAS/SAN.

bq. Why is the behaviour different from what happens when zkfc loses the 
ephemeral node? Currently zkfc when it loses the ephemeral node will shutdown 
the active NN

No, it doesn't - it will transition it to standby. But, as I commented 
elsewhere, this is redundant, because the _new_ active is actually going to 
fence it anyway before taking over.

bq. Similarly if active NN does not hear from zkfc, it implies that zkfc is 
dead, going through gc pause essentially resulting in loss of ephemeral node.

But this can reduce uptime. For example, imagine an administrator accidentally 
changes the ACL on zookeeper. This causes both ZKFCs to get an authentication 
error and crash at the same time. With your design, both NNs will then commit 
suicide. With the existing implementation, the system will continue to run in 
its existing state -- i.e no new failovers will occur, but whoever is active 
will remain active.

bq.  If active NN loses quorum, it has to shutdown

Yes, it has to shut down _before_ it does any edits, or it has to be fenced by 
the next active. Notification of session loss is asynchronous. The same is true 
of your proposal. In either case it can take arbitrarily long before it 
notices that it should not be active. So we still require that the new active 
fence it before it becomes active. So, this proposal doesn't solve any problems.

bq. In fact, one of the most of the difficult APIs to implement correctly would 
be transitionToStandby() from active state.

We already have that implemented. It syncs any existing edits, and then stops 
allowing new ones. We allow failover from one node to another without aborting, 
so long as it's graceful. This is perfectly correct. If we need to do a 
non-graceful failover, we fence the node by STONITH or by disallowing further 
access to the edit logs (which indirectly causes the node to abort, since 
logSync() fails).

It seems you're trying to solve problems we've already solved.

 Active NN should exit when it has not received a getServiceStatus() rpc from 
 ZKFC for timeout secs
 --

 Key: HDFS-3192
 URL: https://issues.apache.org/jira/browse/HDFS-3192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: HDFS portion of ZK-based FailoverController

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246729#comment-13246729
 ] 

Todd Lipcon commented on HDFS-2185:
---

Hi Mingjie. Thanks for taking a look.

The idea for the chain of RPCs is from talking with some folks here who work on 
Hadoop deployment. Their opinion was the following: currently, most of the 
Hadoop client tools are too thick. For example, in the current manual 
failover implementation, the fencing is run on the admin client. This means 
that you have to run the haadmin command from a machine that has access to all 
of the necessary fencing scripts, key files, etc. That's a little bizarre -- 
you would expect to configure these kinds of things only on the central 
location, not on the client.

So, we decided that it makes sense to push the management of the whole failover 
process into the FCs themselves, and just use a single RPC to kick off the 
whole failover process. This keeps the client thin.

As for your proposed alternative, here are a few thoughts:

bq. existing manual fo code can be kept mostly
We actually share much of the code already. But, the problem with using the 
existing code exactly as is, is that the failover controllers always expect to 
have complete control over the system. If the state of the NNs changes 
underneath the ZKFC, then the state in ZK will become inconsistent with the 
actual state of the system, and it's very easy to get into split brain 
scenarios. So, the idea is that, when auto-failover is enabled, *all* decisions 
must be made by ZKFCs. That way we can make sure the ZK state doesn't get out 
of sync.

bq. although new RPC is added to ZKFC but we don't need them to talk to each 
other. the manual failover logic is all handled at client – haadmin.
As noted above I think this is a con, not a pro, because it requires 
configuring fencing scripts at the client, and likely requiring that the client 
have read-write access to ZK

bq. easier to extend to the case of multiple standby NNs

I think the extension path to multiple standby is actually equally easy with 
both approaches. The solution in the ZKFC-managed implementation is to add a 
new znode like PreferredActive and have nodes avoid becoming active unless 
they're listed as preferred. The target node of the failover can just set 
itself to be preferred before asking the other node to cede the lock.


Some other advantages that I probably didn't explain well in the design doc:
- this design is fault tolerant. If the target node crashes in the middle of 
the process, then the old active will automatically regain the active state 
after its rejoin timeout elapses. With a client-managed setup, a well-meaning 
admin may ^C the process in the middle and leave the system with no active at 
all.
- no need to introduce disable/enable to auto-failover. Just having both 
nodes quit the election wouldn't work, since one would end up quitting before 
the other, causing a blip where an unnecessary (random) failover occurred. We 
could carefully orchestrate the order of quitting, so the active quits last, 
but I think it still gets complicated.

 HA: HDFS portion of ZK-based FailoverController
 ---

 Key: HDFS-2185
 URL: https://issues.apache.org/jira/browse/HDFS-2185
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: auto-failover, ha
Affects Versions: 0.24.0, 0.23.3
Reporter: Eli Collins
Assignee: Todd Lipcon
 Fix For: Auto failover (HDFS-3042)

 Attachments: Failover_Controller.jpg, hdfs-2185.txt, hdfs-2185.txt, 
 hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt, zkfc-design.pdf, 
 zkfc-design.pdf, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.tex


 This jira is for a ZK-based FailoverController daemon. The FailoverController 
 is a separate daemon from the NN that does the following:
 * Initiates leader election (via ZK) when necessary
 * Performs health monitoring (aka failure detection)
 * Performs fail-over (standby to active and active to standby transitions)
 * Heartbeats to ensure the liveness
 It should have the same/similar interface as the Linux HA RM to aid 
 pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3178) Add states for journal synchronization in journal daemon

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246731#comment-13246731
 ] 

Todd Lipcon commented on HDFS-3178:
---

Hi Nicholas. Could you please add some javadoc to the state enum values 
explaining the purpose of each state, and what the transitions are between 
them? Or augment the design doc for HDFS-3092 with this state machine, and 
reference it from the code?

 Add states for journal synchronization in journal daemon
 

 Key: HDFS-3178
 URL: https://issues.apache.org/jira/browse/HDFS-3178
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h3178_20120403_svn_mv.patch, h3178_20120404.patch, 
 h3178_20120404_svn_mv.patch, svn_mv.sh


 Journal in a new daemon has to be synchronized to the current transaction.  
 It requires new states such as WaitingForRoll, Syncing and Synced.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3168) Clean up FSNamesystem and BlockManager

2012-04-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246734#comment-13246734
 ] 

Todd Lipcon commented on HDFS-3168:
---

Let's ask the dev list. I'll start a thread.

 Clean up FSNamesystem and BlockManager
 --

 Key: HDFS-3168
 URL: https://issues.apache.org/jira/browse/HDFS-3168
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.24.0, 0.23.3

 Attachments: h3168_20120330.patch, h3168_20120402.patch, 
 h3168_20120403.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   5   6   7   >