[jira] [Commented] (HDFS-3307) when save FSImage ,HDFS( or SecondaryNameNode or FSImage)can't handle some file whose file name has some special messy code(乱码)
[ https://issues.apache.org/jira/browse/HDFS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258331#comment-13258331 ] Todd Lipcon commented on HDFS-3307: --- Rather than change the code to not use UTF8, I think we should figure out why the UTF8 writeString function is writing the wrong data. Is 乱码 the string that causes the problem? I tried to reproduce using this string, but it works fine here. (I did hadoop fs -put /etc/issue '乱码', then successfully restarted and catted the file) when save FSImage ,HDFS( or SecondaryNameNode or FSImage)can't handle some file whose file name has some special messy code(乱码) - Key: HDFS-3307 URL: https://issues.apache.org/jira/browse/HDFS-3307 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Environment: SUSE LINUX Reporter: yixiaohua Attachments: FSImage.java Original Estimate: 12h Remaining Estimate: 12h this the log information of the exception from the SecondaryNameNode: 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Found lease for non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@??? ??tor.qzone.qq.com/keypart-00174 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) this is the log information about the file from namenode: 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=create src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=null perm=boss:boss:rw-r--r-- 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=rename src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=/user/boss/pgv/fission/task16/split/ @? tor.qzone.qq.com/keypart-00174 perm=boss:boss:rw-r--r-- after check the code that save FSImage,I found there are a problem that maybe a bug of HDFS Code,I past below: -this is the saveFSImage method in FSImage.java, I make some mark at the problem code /** * Save the contents of the FS image to the file. */ void saveFSImage(File newFile) throws IOException { FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem(); FSDirectory fsDir = fsNamesys.dir; long startTime = FSNamesystem.now(); // // Write out data // DataOutputStream out = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(newFile))); try { . // save the rest of the nodes saveImage(strbuf, 0, fsDir.rootDir, out);--problem fsNamesys.saveFilesUnderConstruction(out);--problem detail is below strbuf = null; } finally { out.close(); } LOG.info(Image file of size + newFile.length() + saved in + (FSNamesystem.now() - startTime)/1000 +
[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode
[ https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258512#comment-13258512 ] Todd Lipcon commented on HDFS-3092: --- Can you clarify a few things in this document? - In ParallelWritesWithBarrier, what happens to the journals which timeout/fail? It seems you need to mark them as failed in ZK or something in order to be correct. But if you do that, why do you need Q to be a quorum? Q=1 should suffice for correctness, and Q=2 should suffice in order to always be available to recover. It seems the protocol should be closer to: 1) send out write request to all active JNs 2) wait until all respond, or a configurable timeout 3) any that do not respond are marked as failed in ZK 4) If the remaining number of JNs is sufficient (I'd guess 2) then succeed the write. Otherwise fail the write and abort. The recovery protocol here is also a little tricky. I haven't seen a description of the specifics - there are a number of cases to handle - eg even if a write appears to fail from the perspective of the writer, it may have actually succeeded. Another situation: what happens if the writer crashes between step 2 and step 3 (so the JNs have differing number of txns, but ZK indicates they're all up to date?) Regarding quorum commits: bq. b. The journal set is fixed in the config. Hard to add/replace hardware. There are protocols that could be used to change the quorum size/membership at runtime. They do add complexity, though, so I think they should be seen as a future improvement - but not be discounted as impossible. Another point is that hardware replacement can easily be treated the same as a full crash and loss of disk. If one node completely crashes, a new node could be brought in with the same hostname with no complicated protocols. Adding or removing nodes shouldn't be hard to support during a downtime window, which I think satisfies most use cases pretty well. Regarding bookkeeper: - other operational concerns aren't mentioned: eg it doesn't use Hadoop metrics, doesn't use the same style of configuration files, daemon scripts, etc. Enable journal protocol based editlog streaming for standby namenode Key: HDFS-3092 URL: https://issues.apache.org/jira/browse/HDFS-3092 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 0.24.0, 0.23.3 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: ComparisonofApproachesforHAJournals.pdf, MultipleSharedJournals.pdf, MultipleSharedJournals.pdf, MultipleSharedJournals.pdf Currently standby namenode relies on reading shared editlogs to stay current with the active namenode, for namespace changes. BackupNode used streaming edits from active namenode for doing the same. This jira is to explore using journal protocol based editlog streams for the standby namenode. A daemon in standby will get the editlogs from the active and write it to local edits. To begin with, the existing standby mechanism of reading from a file, will continue to be used, instead of from shared edits, from the local edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3305) GetImageServlet should considered SBN a valid requestor in a secure HA setup
[ https://issues.apache.org/jira/browse/HDFS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257708#comment-13257708 ] Todd Lipcon commented on HDFS-3305: --- +1 pending jenkins GetImageServlet should considered SBN a valid requestor in a secure HA setup Key: HDFS-3305 URL: https://issues.apache.org/jira/browse/HDFS-3305 Project: Hadoop HDFS Issue Type: Bug Components: ha, name-node Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-3305.patch Right now only the NN and 2NN are considered valid requestors. This won't work if the ANN and SBN use distinct principal names. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3271) src/fuse_users.c: use re-entrant versions of getpwuid, getgid, etc
[ https://issues.apache.org/jira/browse/HDFS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256802#comment-13256802 ] Todd Lipcon commented on HDFS-3271: --- It's not a bug in the library, but rather a bug in one of the nss backends (sssd). Plus, that would require re-building the native libs on every different version of EL6, whereas right now a single binary works against any EL6 release. src/fuse_users.c: use re-entrant versions of getpwuid, getgid, etc -- Key: HDFS-3271 URL: https://issues.apache.org/jira/browse/HDFS-3271 Project: Hadoop HDFS Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Use the re-entrant versions of these functions rather than using locking -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3290) Use a better local directory layout for the datanode
[ https://issues.apache.org/jira/browse/HDFS-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256999#comment-13256999 ] Todd Lipcon commented on HDFS-3290: --- It doesn't do a search for a block. The DN keeps the block map in memory. But, I do think this is a good idea, as it will make it easier in the future to avoid having to keep the block map in memory on the DNs. Use a better local directory layout for the datanode Key: HDFS-3290 URL: https://issues.apache.org/jira/browse/HDFS-3290 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor When the HDFS DataNode stores chunks in a local directory, it currently puts all of the chunk files into either one big directory, or a collection of directories. However, there is no way to know which directory a given block will end up in, given its ID. As the number of files increases, this does not scale well. Similar to the git version control system, HDFS should create a few different top level directories keyed off of a few bits in the chunk ID. Git uses 8 bits. This substantially cuts down on the number of chunk files in the same directory and gives increased performance, while not compromising O(1) lookup of chunks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3271) src/fuse_users.c: use re-entrant versions of getpwuid, getgid, etc
[ https://issues.apache.org/jira/browse/HDFS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255347#comment-13255347 ] Todd Lipcon commented on HDFS-3271: --- bq. Or test for POSIX compliance at configure time... Testing for presence of a race is tricky. bq. (FWIW, I'm still mostly convinced that this ugly hack was a waste of time for the majority of folks.) Sure, but for the folks who needed it, it saved their clusters from constant segfaults. src/fuse_users.c: use re-entrant versions of getpwuid, getgid, etc -- Key: HDFS-3271 URL: https://issues.apache.org/jira/browse/HDFS-3271 Project: Hadoop HDFS Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Use the re-entrant versions of these functions rather than using locking -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2631) Rewrite fuse-dfs to use the webhdfs protocol
[ https://issues.apache.org/jira/browse/HDFS-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255350#comment-13255350 ] Todd Lipcon commented on HDFS-2631: --- That seems reasonable. I think it's a given that we need to keep the original libhdfs for performance. Having a libhdfs-alike that goes over HTTP seems reasonable enough but not always preferable. To speak to each of the original points: bq. Compatibility - allows a single fuse client to work across server versions We need to address compatibility for clients in general. Our Java client (and hence libhdfs) need this just as much as fuse. bq. Works with both WebHDFS and Hoop since they are protocol compatible I guess this is an advantage, but given that libhdfs already wraps arbitrary hadoop filesystems, we already have this capability. bq. Removes the overhead related to libhdfs (forking a jvm) fuse is a long-running client, so the fork overhead seems minimal. Recent improvements in libhdfs have also cut out most of the copying overhead. bq. Makes it easier to support features like security Perhaps - but libhdfs needs security anyway, so I don't think it buys us much. Rewrite fuse-dfs to use the webhdfs protocol Key: HDFS-2631 URL: https://issues.apache.org/jira/browse/HDFS-2631 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/fuse-dfs Reporter: Eli Collins Assignee: Jaimin D Jetly We should port the implementation of fuse-dfs to use the webhdfs protocol. This has a number of benefits: * Compatibility - allows a single fuse client to work across server versions * Works with both WebHDFS and Hoop since they are protocol compatible * Removes the overhead related to libhdfs (forking a jvm) * Makes it easier to support features like security -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3285) Null pointer execption at ClientNamenodeProtocolTranslatorPB while running fetchdt
[ https://issues.apache.org/jira/browse/HDFS-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255599#comment-13255599 ] Todd Lipcon commented on HDFS-3285: --- Dup of HDFS-2956? Null pointer execption at ClientNamenodeProtocolTranslatorPB while running fetchdt --- Key: HDFS-3285 URL: https://issues.apache.org/jira/browse/HDFS-3285 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: Brahma Reddy Battula Priority: Minor Fix For: 2.0.0, 3.0.0 Scenario: Run following command ./hdfs fetchdt http://**:50070 then I am getting following nullpointer execption {noformat} Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:771) at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:650) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:766) at org.apache.hadoop.hdfs.tools.DelegationTokenFetcher$1.run(DelegationTokenFetcher.java:191) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205) at org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.main(DelegationTokenFetcher.java:144) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3268) Hdfs mishandles token service incompatible with HA
[ https://issues.apache.org/jira/browse/HDFS-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254916#comment-13254916 ] Todd Lipcon commented on HDFS-3268: --- +1, will commit this momentarily. Thanks, Daryn. Hdfs mishandles token service incompatible with HA Key: HDFS-3268 URL: https://issues.apache.org/jira/browse/HDFS-3268 Project: Hadoop HDFS Issue Type: Bug Components: ha, hdfs client Affects Versions: 0.24.0, 2.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-3268-1.patch, HDFS-3268.patch The {{Hdfs AbstractFileSystem}} is overwriting the token service set by the {{DFSClient}}. The service is not necessarily the correct one since {{DFSClient}} is responsible for the service. Most importantly, this improper behavior is overwriting the HA logical service which indirectly renders {{FileContext}} incompatible with HA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3284) bootstrapStandby fails in secure cluster
[ https://issues.apache.org/jira/browse/HDFS-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254934#comment-13254934 ] Todd Lipcon commented on HDFS-3284: --- ah, the addSecurityConfiguration function is only on the auto-HA branch. Let me pull that into this patch as well. bootstrapStandby fails in secure cluster Key: HDFS-3284 URL: https://issues.apache.org/jira/browse/HDFS-3284 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3284.txt HDFS-3247 improved bootstrapStandby to check if the other NN is in active state before trying to bootstrap. But, it forgot to set up the kerberos principals in the config before doing so. So, bootstrapStandby now fails with Failed to specify server's Kerberos principal name in a secure cluster. (Credit to Stephen Chu for finding this) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3282) Expose getFileLength API.
[ https://issues.apache.org/jira/browse/HDFS-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254963#comment-13254963 ] Todd Lipcon commented on HDFS-3282: --- Nicholas: despite us advertising DFSDataInputStream as a private API, I imagine this change would break people. Could we instead just add a new interface which would be implemented by the existing class? Expose getFileLength API. - Key: HDFS-3282 URL: https://issues.apache.org/jira/browse/HDFS-3282 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs client Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This JIRA is to expose the getFileLength API through a new public DistributedFileSystemInfo class. I would appreciate if someone suggest good name for this public class. Nicholas, did you plan any special design for this public client class? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3161) 20 Append: Excluded DN replica from recovery should be removed from DN.
[ https://issues.apache.org/jira/browse/HDFS-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255026#comment-13255026 ] Todd Lipcon commented on HDFS-3161: --- Hi Uma/Vinay. I ran into an issue like this without use of append(): - Client writing blk_N_GS1 to DN1, DN9, DN10 - Pipeline failed. commitBlockSynchronization succeeded with DN9 and DN10, sets gs to blk_N_GS2 - Client closes the pipeline - NN issues replication request of blk_N_GS2 from DN9 to DN1 - DN1 already has blk_N_GS1 in its ongoingCreates map I'm not sure if this can cause any serious issue with the block (it didn't in my case), but I agree that, if a replication request happens for a block with a higher genstamp, it should interrupt the old block's ongoingCreate. If the replication request is a lower genstamp, it should be ignored. 20 Append: Excluded DN replica from recovery should be removed from DN. --- Key: HDFS-3161 URL: https://issues.apache.org/jira/browse/HDFS-3161 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.0 Reporter: suja s Priority: Critical Fix For: 1.0.3 1) DN1-DN2-DN3 are in pipeline. 2) Client killed abruptly 3) one DN has restarted , say DN3 4) In DN3 info.wasRecoveredOnStartup() will be true 5) NN recovery triggered, DN3 skipped from recovery due to above check. 6) Now DN1, DN2 has blocks with generataion stamp 2 and DN3 has older generation stamp say 1 and also DN3 still has this block entry in ongoingCreates 7) as part of recovery file has closed and got only two live replicas ( from DN1 and DN2) 8) So, NN issued the command for replication. Now DN3 also has the replica with newer generation stamp. 9) Now DN3 contains 2 replicas on disk. and one entry in ongoing creates with referring to blocksBeingWritten directory. When we call append/ leaseRecovery, it may again skip this node for that recovery as blockId entry still presents in ongoingCreates with startup recovery true. It may keep continue this dance for evry recovery. And this stale replica will not be cleaned untill we restart the cluster. Actual replica will be trasferred to this node only through replication process. Also unnecessarily that replicated blocks will get invalidated after next recoveries -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3284) bootstrapStandby fails in secure cluster
[ https://issues.apache.org/jira/browse/HDFS-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255101#comment-13255101 ] Todd Lipcon commented on HDFS-3284: --- The test failure is unrelated (this patch doesn't touch that area of the code) bootstrapStandby fails in secure cluster Key: HDFS-3284 URL: https://issues.apache.org/jira/browse/HDFS-3284 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3284.txt, hdfs-3284.txt HDFS-3247 improved bootstrapStandby to check if the other NN is in active state before trying to bootstrap. But, it forgot to set up the kerberos principals in the config before doing so. So, bootstrapStandby now fails with Failed to specify server's Kerberos principal name in a secure cluster. (Credit to Stephen Chu for finding this) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2631) Rewrite fuse-dfs to use the webhdfs protocol
[ https://issues.apache.org/jira/browse/HDFS-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255255#comment-13255255 ] Todd Lipcon commented on HDFS-2631: --- I'm a little confused: why is this a good idea? Seems like it's likely to end up much slower than the current implementation. I'd prefer it as another option, rather than a rewrite. Rewrite fuse-dfs to use the webhdfs protocol Key: HDFS-2631 URL: https://issues.apache.org/jira/browse/HDFS-2631 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/fuse-dfs Reporter: Eli Collins Assignee: Jaimin D Jetly We should port the implementation of fuse-dfs to use the webhdfs protocol. This has a number of benefits: * Compatibility - allows a single fuse client to work across server versions * Works with both WebHDFS and Hoop since they are protocol compatible * Removes the overhead related to libhdfs (forking a jvm) * Makes it easier to support features like security -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3042) Automatic failover support for NN HA
[ https://issues.apache.org/jira/browse/HDFS-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253883#comment-13253883 ] Todd Lipcon commented on HDFS-3042: --- Sanjay: this is being done in a branch... it's in branches/HDFS-3042 in SVN. Automatic failover support for NN HA Key: HDFS-3042 URL: https://issues.apache.org/jira/browse/HDFS-3042 Project: Hadoop HDFS Issue Type: New Feature Components: auto-failover, ha Reporter: Todd Lipcon Assignee: Todd Lipcon HDFS-1623 was the umbrella task for implementation of NN HA capabilities. However, it only focused on manually-triggered failover. Given that the HDFS-1623 branch will be merged shortly, I'm opening this JIRA to consolidate/track subtasks for automatic failover support and related improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2708) Stats for the # of blocks per DN
[ https://issues.apache.org/jira/browse/HDFS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253903#comment-13253903 ] Todd Lipcon commented on HDFS-2708: --- +1 pending jenkins Stats for the # of blocks per DN Key: HDFS-2708 URL: https://issues.apache.org/jira/browse/HDFS-2708 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Priority: Minor Attachments: HDFS-2708.patch It would be useful for tools to be able to retrieve the total number of blocks on each datanode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3280) DFSOutputStream.sync should not be synchronized
[ https://issues.apache.org/jira/browse/HDFS-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253941#comment-13253941 ] Todd Lipcon commented on HDFS-3280: --- Verified that this increased my benchmark performance by a factor of two. DFSOutputStream.sync should not be synchronized --- Key: HDFS-3280 URL: https://issues.apache.org/jira/browse/HDFS-3280 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Attachments: hdfs-3280.txt HDFS-895 added an optimization to make hflush() much faster by unsynchronizing it. But, we forgot to un-synchronize the deprecated {{sync()}} wrapper method. This makes the HBase WAL really slow on 0.23+ since it doesn't take advantage of HDFS-895 anymore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3280) DFSOutputStream.sync should not be synchronized
[ https://issues.apache.org/jira/browse/HDFS-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253951#comment-13253951 ] Todd Lipcon commented on HDFS-3280: --- bq. Ah, so this explains what you guys thought might be an interaction with Nagle? Yep, turned out to be much simpler :) The patch failed on Hudson due to HDFS-3034 having removed the deprecated method. I'll commit this based on Aaron's +1 and based on my manual stress testing using HBase's HLog class, which uses this method. No unit tests since it's hard to unit test for performance, and the hflush equivalent is already tested by TestMultithreadedHflush DFSOutputStream.sync should not be synchronized --- Key: HDFS-3280 URL: https://issues.apache.org/jira/browse/HDFS-3280 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Attachments: hdfs-3280.txt HDFS-895 added an optimization to make hflush() much faster by unsynchronizing it. But, we forgot to un-synchronize the deprecated {{sync()}} wrapper method. This makes the HBase WAL really slow on 0.23+ since it doesn't take advantage of HDFS-895 anymore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack
[ https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252240#comment-13252240 ] Todd Lipcon commented on HDFS-3256: --- I think there's an issue here with safemode's delayed initialization of repl queues. As the DNs are checking in, when the cluster transitions from single-rack to multi-rack, it will call processMisReplicatedBlocks, even if the threshold (DFS_NAMENODE_REPL_QUEUE_THRESHOLD_PCT_KEY) hasn't been crossed. I think you need to somehow tie this back to the flag in safemode which determines whether misreplicated blocks have been processed yet. HDFS considers blocks under-replicated if topology script is configured with only 1 rack Key: HDFS-3256 URL: https://issues.apache.org/jira/browse/HDFS-3256 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-3256.patch, HDFS-3256.patch HDFS treats the mere presence of a topology script being configured as evidence that there are multiple racks. If there is in fact only a single rack, the NN will try to place the blocks on at least two racks, and thus blocks will be considered to be under-replicated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack
[ https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252560#comment-13252560 ] Todd Lipcon commented on HDFS-3256: --- Is this check actually correct? I think you need to check whether the repl queues are initialized, explicitly. For example, you could enter manual safe mode, in which case the repl queues are still being tracked, and it's incorrect to not call processMisReplicatedBlocks. HDFS considers blocks under-replicated if topology script is configured with only 1 rack Key: HDFS-3256 URL: https://issues.apache.org/jira/browse/HDFS-3256 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch HDFS treats the mere presence of a topology script being configured as evidence that there are multiple racks. If there is in fact only a single rack, the NN will try to place the blocks on at least two racks, and thus blocks will be considered to be under-replicated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3255) HA DFS returns wrong token service
[ https://issues.apache.org/jira/browse/HDFS-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252563#comment-13252563 ] Todd Lipcon commented on HDFS-3255: --- Hm, I'm just surprised, because it seemed to work when we tested running MR against an HA cluster. But maybe there's some bug that happens when it comes time to renew the token, or something? bq. I can see if I can use Jitendra's feature to enable security for a unit test if you'd like. When I tried to use that feature, I couldn't get it to work. Maybe you'll have better luck? If you can describe a manual test scenario that seems good enough for me. HA DFS returns wrong token service -- Key: HDFS-3255 URL: https://issues.apache.org/jira/browse/HDFS-3255 Project: Hadoop HDFS Issue Type: Bug Components: ha, hdfs client Affects Versions: 2.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-3255.patch {{fs.getCanonicalService()}} must be equal to {{fs.getDelegationToken(renewer).getService()}}. When HA is enabled, the DFS token's service is a logical uri, but {{dfs.getCanonicalService()}} is only returning the hostname of the logical uri. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack
[ https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252614#comment-13252614 ] Todd Lipcon commented on HDFS-3256: --- patch looks good. One minor comment: can you please add an INFO log saying something like: Datanode blah blah joining cluster has expanded a formerly single-rack cluster to multi-rack. Re-checking all blocks for replication, since they should now be replicated cross-rack HDFS considers blocks under-replicated if topology script is configured with only 1 rack Key: HDFS-3256 URL: https://issues.apache.org/jira/browse/HDFS-3256 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch HDFS treats the mere presence of a topology script being configured as evidence that there are multiple racks. If there is in fact only a single rack, the NN will try to place the blocks on at least two racks, and thus blocks will be considered to be under-replicated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3259) NameNode#initializeSharedEdits should populate shared edits dir with edit log segments
[ https://issues.apache.org/jira/browse/HDFS-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252660#comment-13252660 ] Todd Lipcon commented on HDFS-3259: --- A few issues: - The file copy you're doing could fail with a half-written file in the middle. I think you need to copy to a tmp filename and then rename once the copy is done. You could use AtomicFileOutputStream here, actually, since fsyncing it seems reasonable. - Rather than blindly casting, I think it's worth checking instanceof, and bailing out with an error if one of the journals isn't a FileJournalManager. The error can say that this initialization feature currently only works with file-based streams. That's better than an ugly ClassCastException stack trace NameNode#initializeSharedEdits should populate shared edits dir with edit log segments -- Key: HDFS-3259 URL: https://issues.apache.org/jira/browse/HDFS-3259 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-3259.patch, HDFS-3259.patch Currently initializeSharedEdits formats the shared dir so that subsequent edit log segments will be written there. However, it would be nice to automatically populate this dir with edit log segments with transactions going back to the last fsimage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3255) HA DFS returns wrong token service
[ https://issues.apache.org/jira/browse/HDFS-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252668#comment-13252668 ] Todd Lipcon commented on HDFS-3255: --- k, sounds good. +1 HA DFS returns wrong token service -- Key: HDFS-3255 URL: https://issues.apache.org/jira/browse/HDFS-3255 Project: Hadoop HDFS Issue Type: Bug Components: ha, hdfs client Affects Versions: 2.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-3255.patch {{fs.getCanonicalService()}} must be equal to {{fs.getDelegationToken(renewer).getService()}}. When HA is enabled, the DFS token's service is a logical uri, but {{dfs.getCanonicalService()}} is only returning the hostname of the logical uri. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack
[ https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252677#comment-13252677 ] Todd Lipcon commented on HDFS-3256: --- bq. not yet processing processing repl queues Too many processings. You've been hanging out with Eli too much. Also, I'd say we should just log it at DEBUG level for the Not checking case. HDFS considers blocks under-replicated if topology script is configured with only 1 rack Key: HDFS-3256 URL: https://issues.apache.org/jira/browse/HDFS-3256 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch HDFS treats the mere presence of a topology script being configured as evidence that there are multiple racks. If there is in fact only a single rack, the NN will try to place the blocks on at least two racks, and thus blocks will be considered to be under-replicated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3267) TestBlocksWithNotEnoughRacks races with DN startup
[ https://issues.apache.org/jira/browse/HDFS-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252784#comment-13252784 ] Todd Lipcon commented on HDFS-3267: --- The test can be made to fail by adding a sleep(5000) at the start of BPServiceActor's thread. TestBlocksWithNotEnoughRacks races with DN startup -- Key: HDFS-3267 URL: https://issues.apache.org/jira/browse/HDFS-3267 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor In TestBlocksWithNotEnoughRacks.testCorruptBlockRereplicatedAcrossRacks, it restarts a DN, and then proceeds to call waitCorruptReplicas. But, because of HDFS-3266, it doesn't actually wait very long while checking for the corrupt block to be reported. Since the DN starts back up asynchronously, the test will fail if it starts too slowly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack
[ https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252788#comment-13252788 ] Todd Lipcon commented on HDFS-3256: --- +1 HDFS considers blocks under-replicated if topology script is configured with only 1 rack Key: HDFS-3256 URL: https://issues.apache.org/jira/browse/HDFS-3256 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch, HDFS-3256.patch HDFS treats the mere presence of a topology script being configured as evidence that there are multiple racks. If there is in fact only a single rack, the NN will try to place the blocks on at least two racks, and thus blocks will be considered to be under-replicated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3268) Hdfs mishandles token service incompatible with HA
[ https://issues.apache.org/jira/browse/HDFS-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252853#comment-13252853 ] Todd Lipcon commented on HDFS-3268: --- lgtm, except a typo in this comment: {code} + * Get a canonical token service name for this client's tokens. Null should + * tokens if the client is not using tokens. {code} Hdfs mishandles token service incompatible with HA Key: HDFS-3268 URL: https://issues.apache.org/jira/browse/HDFS-3268 Project: Hadoop HDFS Issue Type: Bug Components: ha, hdfs client Affects Versions: 0.24.0, 2.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-3268.patch The {{Hdfs AbstractFileSystem}} is overwriting the token service set by the {{DFSClient}}. The service is not necessarily the correct one since {{DFSClient}} is responsible for the service. Most importantly, this improper behavior is overwriting the HA logical service which indirectly renders {{FileContext}} incompatible with HA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode
[ https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252855#comment-13252855 ] Todd Lipcon commented on HDFS-3092: --- bq. Perhaps we can get away with this by using some assumptions on timeouts, or by additional constraints on the standby. Eg. that it only syncs with finalized edit segments. That's my plan in HDFS-3077, and in fact that's the current behavior of the SBN, even when operating on NFS. Enable journal protocol based editlog streaming for standby namenode Key: HDFS-3092 URL: https://issues.apache.org/jira/browse/HDFS-3092 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 0.24.0, 0.23.3 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: MultipleSharedJournals.pdf, MultipleSharedJournals.pdf, MultipleSharedJournals.pdf Currently standby namenode relies on reading shared editlogs to stay current with the active namenode, for namespace changes. BackupNode used streaming edits from active namenode for doing the same. This jira is to explore using journal protocol based editlog streams for the standby namenode. A daemon in standby will get the editlogs from the active and write it to local edits. To begin with, the existing standby mechanism of reading from a file, will continue to be used, instead of from shared edits, from the local edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3092) Enable journal protocol based editlog streaming for standby namenode
[ https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252923#comment-13252923 ] Todd Lipcon commented on HDFS-3092: --- bq. By 'my plan' are you referring to an API on the journal node to read latest edits that replaces the current standby NN tailing code? Yep - well, not replaces, but rather just implements the correct APIs in JournalManager. We already have read side APIs there to get an input stream starting at a given txid. We just need implementations that do the remote reads. Enable journal protocol based editlog streaming for standby namenode Key: HDFS-3092 URL: https://issues.apache.org/jira/browse/HDFS-3092 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 0.24.0, 0.23.3 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: MultipleSharedJournals.pdf, MultipleSharedJournals.pdf, MultipleSharedJournals.pdf Currently standby namenode relies on reading shared editlogs to stay current with the active namenode, for namespace changes. BackupNode used streaming edits from active namenode for doing the same. This jira is to explore using journal protocol based editlog streams for the standby namenode. A daemon in standby will get the editlogs from the active and write it to local edits. To begin with, the existing standby mechanism of reading from a file, will continue to be used, instead of from shared edits, from the local edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251925#comment-13251925 ] Todd Lipcon commented on HDFS-3094: --- A few nits on the branch-1 patch: {code} + System.err.println(Format aborted as dir + curDir + exits.); {code} typo: exists. I would also reformat as: Format aborted: + curDir + exists. the same typo (exits for exists) is in one of the test cases {code} + boolean isConfirmationNeeded, boolean isInterActive) throws IOException { {code} Should be {{isInteractive}} -- not capital 'A' {code} + StartupOption.FORMAT.getName() + [ + StartupOption.FORCE.getName() + + ] [+StartupOption.NONINTERACTIVE.getName()+] | [ + {code} Formatting is off here. When I run it I see: {code} Usage: java NameNode [-format[-force ] [-nonInteractive] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] {code} should read: {code} Usage: java NameNode [-format [-force] [-nonInteractive]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] {code} add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Fix For: 2.0.0 Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3256) HDFS considers blocks under-replicated if topology script is configured with only 1 rack
[ https://issues.apache.org/jira/browse/HDFS-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251958#comment-13251958 ] Todd Lipcon commented on HDFS-3256: --- Looks good. +1 pending jenkins HDFS considers blocks under-replicated if topology script is configured with only 1 rack Key: HDFS-3256 URL: https://issues.apache.org/jira/browse/HDFS-3256 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-3256.patch HDFS treats the mere presence of a topology script being configured as evidence that there are multiple racks. If there is in fact only a single rack, the NN will try to place the blocks on at least two racks, and thus blocks will be considered to be under-replicated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3255) HA DFS returns wrong token service
[ https://issues.apache.org/jira/browse/HDFS-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252039#comment-13252039 ] Todd Lipcon commented on HDFS-3255: --- hey Daryn. Fix looks good. Is there a manual test that be run with MR, for example, to verify this as well? i.e how did you discover this issue? HA DFS returns wrong token service -- Key: HDFS-3255 URL: https://issues.apache.org/jira/browse/HDFS-3255 Project: Hadoop HDFS Issue Type: Bug Components: ha, hdfs client Affects Versions: 2.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-3255.patch {{fs.getCanonicalService()}} must be equal to {{fs.getDelegationToken(renewer).getService()}}. When HA is enabled, the DFS token's service is a logical uri, but {{dfs.getCanonicalService()}} is only returning the hostname of the logical uri. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3243) TestParallelRead timing out on jenkins
[ https://issues.apache.org/jira/browse/HDFS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250803#comment-13250803 ] Todd Lipcon commented on HDFS-3243: --- Seems like this test was lengthened by HDFS-2834. I'm not sure yet whether it represents a performance regression, or if the test itself was just changed in such a way that it runs much longer. On branch-2: {code} testcase time=9.15 classname=org.apache.hadoop.hdfs.TestParallelRead name=testParallelRead/ {code} On trunk: {code} testcase time=23.397 classname=org.apache.hadoop.hdfs.TestParallelRead name=testParallelReadCopying/ testcase time=133.218 classname=org.apache.hadoop.hdfs.TestParallelRead name=testParallelReadByteBuffer/ testcase time=61.364 classname=org.apache.hadoop.hdfs.TestParallelRead name=testParallelReadMixed/ {code} I also see a lot of blocked threads in the jstack on trunk. I asked Henry to take a look at this. TestParallelRead timing out on jenkins -- Key: HDFS-3243 URL: https://issues.apache.org/jira/browse/HDFS-3243 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client, test Reporter: Todd Lipcon Assignee: Henry Robinson Trunk builds have been failing recently due to a TestParallelRead timeout. It doesn't report in the Jenkins failure list because surefire handles timeouts really poorly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3243) TestParallelRead timing out on jenkins
[ https://issues.apache.org/jira/browse/HDFS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250820#comment-13250820 ] Todd Lipcon commented on HDFS-3243: --- fwiw I tried copying the old TestParallelRead from branch-2 into trunk, and it runs just as fast there as it does in branch-2. So this seems like an issue with the new test code, rather than a regression in the read performance of the existing path. TestParallelRead timing out on jenkins -- Key: HDFS-3243 URL: https://issues.apache.org/jira/browse/HDFS-3243 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client, test Reporter: Todd Lipcon Assignee: Henry Robinson Trunk builds have been failing recently due to a TestParallelRead timeout. It doesn't report in the Jenkins failure list because surefire handles timeouts really poorly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250954#comment-13250954 ] Todd Lipcon commented on HDFS-2983: --- How about the following proposal: - change the check for DN registration so that, if the DN's ctime differs from the NN's ctime (i.e the NN has started a snapshot style upgrade), then the version check will be strict - file a follow-up JIRA to add a cluster version summary to the web UI and to the NN metrics, allowing ops to monitor whether they have machines that might have missed a rolling upgrade Does that address your concern? I agree with your point that it can be confusing to manage, but not sure what the specific change you're asking for is. Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3245) Add metrics and web UI for cluster version summary
[ https://issues.apache.org/jira/browse/HDFS-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250998#comment-13250998 ] Todd Lipcon commented on HDFS-3245: --- Another thing: we already have some per-datanode info in JMX. We should be sure to add the registered software version to this info (and to the DFS node list pages) bq. It would be great to have some sort of statistics about the clients as well I agree it would be nice, but we don't currently send software version strings in the IPC handshake or anything. So, I think we should handle it separately (this JIRA is just about exposing info we can already easily track) Add metrics and web UI for cluster version summary -- Key: HDFS-3245 URL: https://issues.apache.org/jira/browse/HDFS-3245 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0 Reporter: Todd Lipcon With the introduction of protocol compatibility, once HDFS-2983 is committed, we have the possibility that different nodes in a cluster are running different software versions. To aid operators, we should add the ability to summarize the status of versions in the cluster, so they can easily determine whether a rolling upgrade is in progress or if some nodes missed an upgrade (eg maybe they were out of service when the software was updated) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3247) Improve bootstrapStandby behavior when original NN is not active
[ https://issues.apache.org/jira/browse/HDFS-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251029#comment-13251029 ] Todd Lipcon commented on HDFS-3247: --- I think we could do one of the following: 1) Improve the error message to note that the admin should make sure the other NN is active before proceeding 2) Have it automatically transition it to active if this is the case. What do you think? I think 1 makes more sense, since the admin has to make an explicit decision. Improve bootstrapStandby behavior when original NN is not active Key: HDFS-3247 URL: https://issues.apache.org/jira/browse/HDFS-3247 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Currently, if you run bootstrapStandby while the first NN is in standby mode, it will spit out an ugly StandbyException with a trace. Instead, it should print an explanation that you should transition the first NN to active before bootstrapping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251118#comment-13251118 ] Todd Lipcon commented on HDFS-3094: --- +1, I'll commit this momentarily add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3248) bootstrapstanby repeated twice in hdfs namenode usage message
[ https://issues.apache.org/jira/browse/HDFS-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251123#comment-13251123 ] Todd Lipcon commented on HDFS-3248: --- +1, thanks for fixing this. bootstrapstanby repeated twice in hdfs namenode usage message - Key: HDFS-3248 URL: https://issues.apache.org/jira/browse/HDFS-3248 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3248.002.patch The HDFS usage message repeats bootstrapStandby twice. {code} Usage: java NameNode [-backup] | [-checkpoint] | [-format[-clusterid cid ]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-bootstrapStandby] | [-initializeSharedEdits] | [-bootstrapStandby] | [-recover [ -force ] ] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3244) Remove dead writable code from hdfs/protocol
[ https://issues.apache.org/jira/browse/HDFS-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251122#comment-13251122 ] Todd Lipcon commented on HDFS-3244: --- +1, thanks for the cleanup, glad to be rid of that code. Remove dead writable code from hdfs/protocol Key: HDFS-3244 URL: https://issues.apache.org/jira/browse/HDFS-3244 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3244.txt While doing HDFS-3238 I noticed that there's more dead writable code in hdfs/protocol. Let's remove it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3243) TestParallelRead timing out on jenkins
[ https://issues.apache.org/jira/browse/HDFS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251127#comment-13251127 ] Todd Lipcon commented on HDFS-3243: --- +1, verified in the test results above that TestParallelRead passed relatively quickly. TestParallelRead timing out on jenkins -- Key: HDFS-3243 URL: https://issues.apache.org/jira/browse/HDFS-3243 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client, test Reporter: Todd Lipcon Assignee: Henry Robinson Attachments: HDFS-3243.0.patch Trunk builds have been failing recently due to a TestParallelRead timeout. It doesn't report in the Jenkins failure list because surefire handles timeouts really poorly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251137#comment-13251137 ] Todd Lipcon commented on HDFS-3246: --- Agreed -- this would be particularly useful for HBase which does a lot of preads pRead equivalent for direct read path - Key: HDFS-3246 URL: https://issues.apache.org/jira/browse/HDFS-3246 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Henry Robinson Assignee: Henry Robinson There is no pread equivalent in ByteBufferReadable. We should consider adding one. It would be relatively easy to implement for the distributed case (certainly compared to HDFS-2834), since DFSInputStream does most of the heavy lifting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251150#comment-13251150 ] Todd Lipcon commented on HDFS-2983: --- Cool, thanks Konstantin. Aaron, does the above proposal sound good to you too? Happy to re-review when you update the patch Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251339#comment-13251339 ] Todd Lipcon commented on HDFS-2983: --- +1, reviewed the delta between the latest two patches. Looks good, and nice tests. Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3229) add JournalProtocol RPCs to list finalized edit segments, and read edit segment file from JournalNode.
[ https://issues.apache.org/jira/browse/HDFS-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249929#comment-13249929 ] Todd Lipcon commented on HDFS-3229: --- Can you give an example of the subtle issues you're referring to? The advantage of re-using HTTP is that we've already tested that code path, and it supports things like checksumming, etc. add JournalProtocol RPCs to list finalized edit segments, and read edit segment file from JournalNode. --- Key: HDFS-3229 URL: https://issues.apache.org/jira/browse/HDFS-3229 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Brandon Li Assignee: Brandon Li -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3222) DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.
[ https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249939#comment-13249939 ] Todd Lipcon commented on HDFS-3222: --- bq. I think our proposal won't work here, because by the time of hsync, DN will not report to NN anyway. On the first hflush() for a block, it calls NN.fsync(), which internally calls persistBlocks(). Currently, the fsync call doesn't give a length, but perhaps it could? The other thought is that, after a restart, a block that was previously being written would be in the under construction state, but with no expectedTargets. This differs from the case where a block has been allocated but not yet written to replicas. We could use that to set a new flag in the LocatedBlock response indicating that it's not a 0-length, but instead that it's corrupt. DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block. - Key: HDFS-3222 URL: https://issues.apache.org/jira/browse/HDFS-3222 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 1.0.3, 2.0.0, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HDFS-3222-Test.patch I have seen one situation with Hbase cluster. Scenario is as follows: 1)1.5 blocks has been written and synced. 2)Suddenly cluster has been restarted. Reader opened the file and trying to get the length., By this time partial block contained DNs are not reported to NN. So, locations for this partial block would be 0. In this case, DFSInputStream assumes that, 1 block size as final size. But reader also assuming that, 1 block size is the final length and setting his end marker. Finally reader ending up reading only partial data. Due to this, HMaster could not replay the complete edits. Actually this happend with 20 version. Looking at the code, same should present in trunk as well. {code} int replicaNotFoundCount = locatedblock.getLocations().length; for(DatanodeInfo datanode : locatedblock.getLocations()) { .. .. // Namenode told us about these locations, but none know about the replica // means that we hit the race between pipeline creation start and end. // we require all 3 because some other exception could have happened // on a DN that has it. we want to report that error if (replicaNotFoundCount == 0) { return 0; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3222) DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.
[ https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250002#comment-13250002 ] Todd Lipcon commented on HDFS-3222: --- bq. My point is, even though client flushed the data, DNs will not report to NN right. Did you check the test above? Right, but the client reports to the NN. So, the client could report the number of bytes hflushed, and the NN could fill in the last block with that information when it persists it. bq. You mean we will retry until we get the locations? Yea -- treat it the same as we treat a corrupt file. {quote} 1) client wants to read some partial data which exists in first block itself, 2) open may try to get complete length, and that will block if we retry until DNs reports to NN. 3) But really that DNs down for long time. This time, we can not read even until the specified length, which is less than the start offset of partial block. {quote} That's true. Is it possible for us to change the client code to defer this code path until either (a) the client wants to read from the partial block, or (b) the client explictly asks for the file length? Alternatively, maybe this is so rare that it doesn't matter, and it's OK to disallow reading from an unrecovered file whose last block is missing all of its block locations after a restart. DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block. - Key: HDFS-3222 URL: https://issues.apache.org/jira/browse/HDFS-3222 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 1.0.3, 2.0.0, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HDFS-3222-Test.patch I have seen one situation with Hbase cluster. Scenario is as follows: 1)1.5 blocks has been written and synced. 2)Suddenly cluster has been restarted. Reader opened the file and trying to get the length., By this time partial block contained DNs are not reported to NN. So, locations for this partial block would be 0. In this case, DFSInputStream assumes that, 1 block size as final size. But reader also assuming that, 1 block size is the final length and setting his end marker. Finally reader ending up reading only partial data. Due to this, HMaster could not replay the complete edits. Actually this happend with 20 version. Looking at the code, same should present in trunk as well. {code} int replicaNotFoundCount = locatedblock.getLocations().length; for(DatanodeInfo datanode : locatedblock.getLocations()) { .. .. // Namenode told us about these locations, but none know about the replica // means that we hit the race between pipeline creation start and end. // we require all 3 because some other exception could have happened // on a DN that has it. we want to report that error if (replicaNotFoundCount == 0) { return 0; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3222) DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.
[ https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250032#comment-13250032 ] Todd Lipcon commented on HDFS-3222: --- bq. It may be difficult for the clients to differ whether this is real corruption or it will be recovered after DN reports to NN. What's the difference? If none of the DNs holding a block have reported a replica, it's missing/corrupt. The same is true of finalized blocks - if three DNs crash, and we have no replicas anymore, it still might come back if an admin fixes one of the DNs. bq. you mean reader will pass the option? (a) or (b). Sorry, I wasn't clear. Right now, the behavior is that, when we call open() on a file which is under construction, we always go to the DNs holding the last block to find the length. My proposal is the following: - on open(), do not determine the visible length of the file. Set the member variable to something like -1 to indicate it's still unknown - in the code that opens a block reader, change it to check if it's about to read from the last block. If it is, try to determine the visible length. - in the explicit getVisibleLength() call, if it's not determined yet, try to determine the visible length With the above changes, we can allow a client who only wants to access the first blocks of a file to do so without having to contact the DNs holding the last block. But as soon as the client wants to access the under-construction block, or explicitly wants to know the visible length, then we go to the DNs. DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block. - Key: HDFS-3222 URL: https://issues.apache.org/jira/browse/HDFS-3222 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 1.0.3, 2.0.0, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HDFS-3222-Test.patch I have seen one situation with Hbase cluster. Scenario is as follows: 1)1.5 blocks has been written and synced. 2)Suddenly cluster has been restarted. Reader opened the file and trying to get the length., By this time partial block contained DNs are not reported to NN. So, locations for this partial block would be 0. In this case, DFSInputStream assumes that, 1 block size as final size. But reader also assuming that, 1 block size is the final length and setting his end marker. Finally reader ending up reading only partial data. Due to this, HMaster could not replay the complete edits. Actually this happend with 20 version. Looking at the code, same should present in trunk as well. {code} int replicaNotFoundCount = locatedblock.getLocations().length; for(DatanodeInfo datanode : locatedblock.getLocations()) { .. .. // Namenode told us about these locations, but none know about the replica // means that we hit the race between pipeline creation start and end. // we require all 3 because some other exception could have happened // on a DN that has it. we want to report that error if (replicaNotFoundCount == 0) { return 0; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250068#comment-13250068 ] Todd Lipcon commented on HDFS-3094: --- Hi Aprit. The patch looks good now, but it seems to have developed some conflicts against trunk add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250070#comment-13250070 ] Todd Lipcon commented on HDFS-3094: --- oops, please excuse my typo of your name, _Arpit_! add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250112#comment-13250112 ] Todd Lipcon commented on HDFS-3004: --- Can you add a release note field for this issue, with brief description of the new feature and a pointer to the docs that describe it? Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.0.0 Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, HDFS-3004.023.patch, HDFS-3004.024.patch, HDFS-3004.026.patch, HDFS-3004.027.patch, HDFS-3004.029.patch, HDFS-3004.030.patch, HDFS-3004.031.patch, HDFS-3004.032.patch, HDFS-3004.033.patch, HDFS-3004.034.patch, HDFS-3004.035.patch, HDFS-3004.036.patch, HDFS-3004.037.patch, HDFS-3004.038.patch, HDFS-3004.039.patch, HDFS-3004.040.patch, HDFS-3004.041.patch, HDFS-3004.042.patch, HDFS-3004.042.patch, HDFS-3004.042.patch, HDFS-3004.043.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3055) Implement recovery mode for branch-1
[ https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250131#comment-13250131 ] Todd Lipcon commented on HDFS-3055: --- - can you explain the changes in FSNamesystem.java? - Can you update the logging in the test cases to use StringUtils.stringifyException to match trunk? - Did you run all the existing tests in branch-1? The one difference that I can see that might cause a failure is that the IOException thrown during a failed startup used to retain the exception {{t}} as its cause, but no longer does. Otherwise looks good. Implement recovery mode for branch-1 Key: HDFS-3055 URL: https://issues.apache.org/jira/browse/HDFS-3055 Project: Hadoop HDFS Issue Type: New Feature Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Fix For: 1.0.0 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch, HDFS-3055-b1.003.patch, HDFS-3055-b1.004.patch, HDFS-3055-b1.005.patch Implement recovery mode for branch-1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250152#comment-13250152 ] Todd Lipcon commented on HDFS-2983: --- I did a little investigation to try to answer Konstantin's questions above. First, I'll summarize our current behavior, verified on 0.23.1 release (I didn't understand this thoroughly before trying it out): - In a running cluster, if you restart the NN without the {{-upgrade}} flag, then the DataNodes will happily re-register without exiting. - If you restart the NN with {{-upgrade}}, then when the DN next heartbeats, it will fail the {{verifyRequest()}} check, since the registration ID's namespace fields no longer match (the ctime has been incremented by the upgrade). This causes the DataNode to exit. - Of course, restarting the DN at this point makes it take the snapshot and participate in the upgrade as expected. So, to try to respond to Konstantin's questions, here are a couple example scenarios: *Scenario 1*: rolling upgrade without doing a snapshot upgrade (for emergency bug fixes, hot fixes, MR fixes, other fixes which we don't expect to affect data reliability): - Leave the NN running, on the old version. - On each DN, in succession: (1) shutdown DN, (2) upgrade software to the new version, (3) start DN The above is sufficient if the changes are scoped only to DNs. If the change also affects the NN, then you will need to add the following step, either at the beginning or end of the process: - shutdown NN. upgrade installed software. start NN on new version In the case of an HA setup, we can do the NN upgrade without downtime: - shutdown SBN. upgrade SBN software. start SBN. - failover to SBN running new version. - Shutdown previous active. Upgrade software. Start previous active - Optionally fail back *Scenario 2*: upgrade to a version with a new layout version (LV) In this case, a snapshot style upgrade is required -- the NN will not restart without the -upgrade flag, and a DN will not connect to a NN with a different LV. So the scenario is the same as today: - Shutdown entire cluster - Upgrade all software in teh clsuter - Start cluster with {{-upgrade}} flag -- any nodes that missed the software upgrade will fail to connect, since their LV does not match (this patch retains that behavior) *Scenario 3*: upgrade to a version with same layout version, but some data risk (for example upgrading to a version with bug fixes pertaining to replication policies, corrupt block detection, etc) In this scenario, the NN does not mandate a {{-upgrade}} flag, but as Sanjay mentioned above, it can still be useful for data protection. As with today, if the user does not want the extra protection, this scenario can be treated identically to scenario 1. If the user does want the protection, it can be treated identically to scenario 2. Scenario 2 remains safe because of the check against the NameNode's {{ctime}} matching the DN's {{ctime}}. As soon as you restart the NN with the {{-upgrade}} flag, all running DNs will exit. Any newly started DN will noticethe new namespace ctime and take part in the snapshot upgrade. Does the above description address your concerns? Another idea would be to add a new configuration option like {{dfs.allow.rolling.upgrades}} which enables the new behavior, so an admin who prefers not to use the feature can disallow it completely. Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3216) DatanodeID should support multiple IP addresses
[ https://issues.apache.org/jira/browse/HDFS-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250181#comment-13250181 ] Todd Lipcon commented on HDFS-3216: --- bq. #2 Yes, when reading/writing DatanodeInfos to/from streams (same as before when creating a DatanodeID w/o a name) When do we read/write DatanodeInfo from streams, now that we are pb-ified? i.e is the writable interface even used anymore? {code} + * Return the canonical IP address for this DatanodeID. Not all uses + * of DatanodeID are multi-IP aware, or would multiple IPs, therefore + * we use the first address as the canonical one. {code} ENOTASENTENCE bq. #1 We still need the notion of canonical IP, mostly for cases that don't care about multiple IP addresses. Updated the javadoc. How is it ensured that the canonical IP is kept consistent across DN restarts, for example? It's just whichever one is listed first in the DN-side configuration? bq. Fixed the cast, now casts to String and serializes/deserializes the IPs, the test does check this (was failing now passes). That's a little strange, to serialize it into a comma-separated list inside JSON. It's not possible to get Jackson to serialize it as a proper JSON array? Perhaps using a ListString inside the map? DatanodeID should support multiple IP addresses --- Key: HDFS-3216 URL: https://issues.apache.org/jira/browse/HDFS-3216 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3216.txt, hdfs-3216.txt The DatanodeID has a single field for the IP address, for HDFS-3146 we need to extend it to support multiple addresses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3055) Implement recovery mode for branch-1
[ https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250194#comment-13250194 ] Todd Lipcon commented on HDFS-3055: --- OK. +1, patch looks good. Please run all the branch-1 unit tests so we don't introduce any other failures - should be OK but best to be safe on the stable branch. When you report back, I'll commit. Implement recovery mode for branch-1 Key: HDFS-3055 URL: https://issues.apache.org/jira/browse/HDFS-3055 Project: Hadoop HDFS Issue Type: New Feature Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Fix For: 1.0.0 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch, HDFS-3055-b1.003.patch, HDFS-3055-b1.004.patch, HDFS-3055-b1.005.patch, HDFS-3055-b1.006.patch Implement recovery mode for branch-1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250211#comment-13250211 ] Todd Lipcon commented on HDFS-2983: --- {code} +if (!dnVersion.equals(nnVersion)) { + LOG.info(Reported DataNode version ' + dnVersion + ' does not match + + NameNode version ' + nnVersion + ' but is within acceptable + + limits. Note: This is normal during a rolling upgrade.); +} {code} Can you also please include the DN IP address in this log message? - Nice lengthy javadoc on VersionUtil.compareVersions. Can you please add something like: This method of comparison is similar to the method used by package versioning systems like deb and RPM and also maybe give one example of what you mean? eg add For example, Hadoop 0.3 Hadoop 0.20 even though naive string comparison would consider it larger. Otherwise, looks great. +1 from my standpoint. Konstantin/Sanjay - can you please comment regarding the above discussion? While I agree that there are more improvements to be made, I don't think this patch will hurt things. Or, if you are nervous about it, can we commit this with a flag to allow rolling upgrade if the operator permits it? Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250235#comment-13250235 ] Todd Lipcon commented on HDFS-3094: --- Sorry again Arpit - looks like the commit of HDFS-3004 caused another conflict here just a couple hours ago... add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3229) add JournalProtocol RPCs to list finalized edit segments, and read edit segment file from JournalNode.
[ https://issues.apache.org/jira/browse/HDFS-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250251#comment-13250251 ] Todd Lipcon commented on HDFS-3229: --- bq. However, if we believe we need web UI for JournalNode, we need the port anyways. I think it's a good idea, since we have other endpoints in our default HTTP server that are very useful for ops -- for example the /jmx servlet and the /conf servlet can both be very handy. I also think exposing a basic web UI is helpful to operators who might try to understand the current state of the system. bq. Suppose we used HTTP server to synchronize the lagging JournalNode by downloading missed edit logs from another Journal Node. Firstly, the lagging JN needs to get (e.g., by asking for NN) a list of JNs with full set of edit logs. Then, it downloads the missed logs from a good JN through http, while it could accept streamed logs from NN through rpc at the same time. Given the two servers are working on different file sets(finalized logs vs in-progress log), synchronizing them seems not a concern. Right - this is the same process that the 2NN uses to synchronize finalized log segments from the NN. See SecondaryNameNode.downloadCheckpointFiles for the code. add JournalProtocol RPCs to list finalized edit segments, and read edit segment file from JournalNode. --- Key: HDFS-3229 URL: https://issues.apache.org/jira/browse/HDFS-3229 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Brandon Li Assignee: Brandon Li -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3236) NameNode does not initialize generic conf keys when started with -initializeSharedEditsDir
[ https://issues.apache.org/jira/browse/HDFS-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250312#comment-13250312 ] Todd Lipcon commented on HDFS-3236: --- +1 pending jenkins NameNode does not initialize generic conf keys when started with -initializeSharedEditsDir -- Key: HDFS-3236 URL: https://issues.apache.org/jira/browse/HDFS-3236 Project: Hadoop HDFS Issue Type: Bug Components: ha, name-node Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Attachments: HDFS-3236.patch This means that configurations that scope the location of the name/edits/shared edits dirs by nameserice or namenode won't work with `hdfs namenode -initializeSharedEdits'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3238) ServerCommand and friends don't need to be writables
[ https://issues.apache.org/jira/browse/HDFS-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250359#comment-13250359 ] Todd Lipcon commented on HDFS-3238: --- +1 pending jenkins results ServerCommand and friends don't need to be writables Key: HDFS-3238 URL: https://issues.apache.org/jira/browse/HDFS-3238 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3238.txt We can remove writable infrastructure from the ServerCommand classes as they're not uses across clients and we're PB within the server side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250366#comment-13250366 ] Todd Lipcon commented on HDFS-2983: --- bq. The scenario that scares me is if somebody does a snapshot, then several rolling upgrades, and then decides to rollback. This may be possible, but seems to be very much error-prone. Why is this scenario different than if somebody does a snapshot, then several _non-rolling_ upgrades, then decides to rollback? In both cases, we have the case of a newer version trying to do a rollback to an older version snapshot. Right? Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3222) DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.
[ https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250369#comment-13250369 ] Todd Lipcon commented on HDFS-3222: --- Sounds good to me. DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block. - Key: HDFS-3222 URL: https://issues.apache.org/jira/browse/HDFS-3222 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 1.0.3, 2.0.0, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HDFS-3222-Test.patch I have seen one situation with Hbase cluster. Scenario is as follows: 1)1.5 blocks has been written and synced. 2)Suddenly cluster has been restarted. Reader opened the file and trying to get the length., By this time partial block contained DNs are not reported to NN. So, locations for this partial block would be 0. In this case, DFSInputStream assumes that, 1 block size as final size. But reader also assuming that, 1 block size is the final length and setting his end marker. Finally reader ending up reading only partial data. Due to this, HMaster could not replay the complete edits. Actually this happend with 20 version. Looking at the code, same should present in trunk as well. {code} int replicaNotFoundCount = locatedblock.getLocations().length; for(DatanodeInfo datanode : locatedblock.getLocations()) { .. .. // Namenode told us about these locations, but none know about the replica // means that we hit the race between pipeline creation start and end. // we require all 3 because some other exception could have happened // on a DN that has it. we want to report that error if (replicaNotFoundCount == 0) { return 0; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3146) Datanode should be able to register multiple network interfaces
[ https://issues.apache.org/jira/browse/HDFS-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249638#comment-13249638 ] Todd Lipcon commented on HDFS-3146: --- {code} + public static InetSocketAddress[] getInterfaceAddrs( + String interfaceNames[], int port) throws UnknownHostException { {code} Sorry I missed this in the earlier review of this function, but I think it would be better to call the parameter something like {{interfaceSpecs}} -- because each one may specify an interface name, an IP address, or a subnet. In the same function, for the subnet case, you're using port 0 instead of the specified port. Looks like a mistake? {code} + LOG.warn(Invalid address given + addrString); {code} Nit: add a ':' to the log message {code} +// If the datanode registered with an address we can't use +// then use the address the IPC came in on instead +if (NetUtils.isWildcardOrLoopback(nodeReg.getIpAddr())) { {code} I found this comment a little unclear. Under what circumstance would the DN pass a loopback or wildcard IP? Aren't they filtered on the DN side? I think this should be at least a WARN, or maybe even throw an exception to disallow the registration. Edit: I got to the part later in the patch where the DN potentially sends a wildcard to the NN. I think it might be simplier to have the DN send an empty list to the NN if it's bound to wildcard -- and adjust the comment here to explain why it would be registering with no addresses. {code} + // TODO: haven't determined the port yet, using default {code} Are you planning another patch to fix this on the branch before merging? What's the backward-compatibility path with the existing configurations for bind address, etc, where the port's specified? We should be clear about which takes precedence, and throw errors on startup if both are configured, I think? Maybe it makes sense to change these to just be InetAddress instead of InetSocketAddress, and never fill in a port there? This patch should add the new config to hdfs-default, and edit the existing config's documentation to explain how the two interact. {code} + if (0 != interfaceStrs.length) { +LOG.info(Using interfaces [ + +Joiner.on(',').join(interfaceStrs)+ ] with addresses [ + +Joiner.on(',').join(interfaceAddrs) + ]); + } {code} - need indentation for the joiner lines - add comment explaining how this eventually gets filled in, if it's empty? {code} + * @param addrs socket addresses to convert + * @return an array of strings of IPs for the given addresses + */ + public static String[] toIpAddrStrings(InetSocketAddress[] addrs) { {code} javadoc should specify that ports aren't included in the stringification of addresses Datanode should be able to register multiple network interfaces --- Key: HDFS-3146 URL: https://issues.apache.org/jira/browse/HDFS-3146 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3146.txt The Datanode should register multiple interfaces with the Namenode (who then forwards them to clients). We can do this by extending the DatanodeID, which currently just contains a single interface, to contain a list of interfaces. For compatibility, the DatanodeID method to get the DN address for data transfer should remain unchanged (multiple interfaces are only used where the client explicitly takes advantage of them). By default, if the Datanode binds on all interfaces (via using the wildcard in the dfs*address configuration) all interfaces are exposed, modulo ones like the loopback that should never be exposed. Alternatively, a new configuration parameter ({{dfs.datanode.available.interfaces}}) allows the set of interfaces can be specified explicitly in case the user only wants to expose a subset. If the new default behavior is too disruptive we could default dfs.datanode.available.interfaces to be the IP of the IPC interface which is the only interface exposed today (per HADOOP-6867, only the port from dfs.datanode.address is used today). The interfaces can be specified by name (eg eth0), subinterface name (eg eth0:0), or IP address. The IP address can be specified by range using CIDR notation so the configuration values are portable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3216) DatanodeID should support multiple IP addresses
[ https://issues.apache.org/jira/browse/HDFS-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249640#comment-13249640 ] Todd Lipcon commented on HDFS-3216: --- Should we deprecate this function? Or do we need some concept of the canonical/main IP address? If the latter, we should explain this in the javadoc of this function. {code} public String getIpAddr() { -return ipAddr; +return ipAddrs[0]; + } {code} - is it ever valid to construct a DatanodeID with no IP addresses? If not we should add a Preconditions check or at least an assert on the length of the ipAddrs array in the constructor and the setter {code} +return new DatanodeID(ipAddrs.toArray(new String[ipAddrs.size()]) , dn.getHostName(), dn.getStorageID(), dn.getXferPort(), dn.getInfoPort(), dn.getIpcPort()); {code} Can you re-wrap this to 80chars? - Is the code change in JsonUtil covered by TestJsonUtil? (are you sure that the cast to String[] is right?) - in some of the tests, it's filling in hostnames instead of IPs for the ipAddrs field. Is that right, or do we expect that it will always be resolved IPs? The dual nature makes me nervous. DatanodeID should support multiple IP addresses --- Key: HDFS-3216 URL: https://issues.apache.org/jira/browse/HDFS-3216 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3216.txt The DatanodeID has a single field for the IP address, for HDFS-3146 we need to extend it to support multiple addresses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3218) Use multiple remote DN interfaces for block transfer
[ https://issues.apache.org/jira/browse/HDFS-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249641#comment-13249641 ] Todd Lipcon commented on HDFS-3218: --- - I think it would make sense to add a utility function like {{DFSUtil.getRandomXferAddress(DatanodeID)}}, since you have a lot of repetition fo the {{DFSUtil.getRandom().nextInt}} stuff. Or even make it a member function of the DatanodeID? Otherwise looks good. Use multiple remote DN interfaces for block transfer Key: HDFS-3218 URL: https://issues.apache.org/jira/browse/HDFS-3218 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs client Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3218.txt HDFS-3146 and HDFS-3216 expose multiple DN interfaces to the client. In order for clients, in aggregate, to use multiple DN interfaces clients should pick different interfaces when transferring blocks. Given that we cache client - DN connections the policy of picking a remote interface at random for each new connection seems best (vs round robin for example). In the future we could make the client congestion aware. We could also establish multiple connections between the client and DN and therefore use multiple interfaces for a single block transfer. Both of those are out of scope for this jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249646#comment-13249646 ] Todd Lipcon commented on HDFS-2983: --- bq. Technically the quote characters inside the Javadoc should be - or you could just use single-quotes instead to avoid the hassle. erg, JIRA went and formatted my explanation :) The quote characters should be {{ quot;}} without the space. Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249645#comment-13249645 ] Todd Lipcon commented on HDFS-2983: --- - Can VersionUtil be made abstract, since it only has static methods? {code} + * This function splits the two versions on . and performs a lexical + * comparison of the resulting components. {code} Technically the quote characters inside the Javadoc should be quot; - or you could just use single-quotes instead to avoid the hassle. VersionUtil should be doing numeric comparison rather than straight string comparison. For example 10.0.0 should be considered greater than 2.0, but I think the current implementation doesn't implement this correctly. Please add a test for this case to TestVersionUtil as well. {code} + private static void assertExpectedValues(String lower, String higher) { +assertTrue(0 VersionUtil.compareVersions(lower, higher)); +assertTrue(0 VersionUtil.compareVersions(higher, lower)); + } {code} These comparisons read backwards to me. ie should be: {code} + private static void assertExpectedValues(String lower, String higher) { +assertTrue(VersionUtil.compareVersions(lower, higher) 0); +assertTrue(VersionUtil.compareVersions(higher, lower) 0); + } {code} don't you think? {code} +if (VersionUtil.compareVersions(dnVersion, minimumDataNodeVersion) 0) { + IncorrectVersionException ive = new IncorrectVersionException( + minimumDataNodeVersion, dnVersion, DataNode, NameNode); + LOG.warn(ive.getMessage()); + throw ive; +} {code} Here, does the log message end up including the remote IP address somehow? If not, I think we should improve it to include that (and maybe the stringified DatanodeRegistration object) Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch, HDFS-2983.patch, HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs
[ https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249181#comment-13249181 ] Todd Lipcon commented on HDFS-3192: --- The state diagram is included in the design doc attached to HDFS-2185. Please comment with an example scenario in which you think there is an incorrect behavior - I don't know of any aside from HADOOP-8217, but if you know of some I'd be really happy to address them rather than find out about them from a broken customer :) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs -- Key: HDFS-3192 URL: https://issues.apache.org/jira/browse/HDFS-3192 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Hari Mankude Assignee: Hari Mankude -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249182#comment-13249182 ] Todd Lipcon commented on HDFS-2983: --- bq. The proposal seems to suggest that the NN does not need to be updated if desired. Correct? Yes, I think that's correct, and desired. Sometimes upgrades only address the slave nodes, so there's no sense having to change the NN. Of course, with HA, upgrading the NN isn't as big a problem, but even so it is a more complicated/delicate operation. bq. I see why it is desirable but does can we simplify things or make upgrades safer if we drop that requirement? I don't know if it makes things much simpler. I think adding a requirement that the NN upgrade before the DNs is quite inconvenient for operators. But I am not 100% sure of this, and willing to be convinced :) Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3229) add JournalProtocol RPCs to list finalized edit segments, and read edit segment file from JournalNode.
[ https://issues.apache.org/jira/browse/HDFS-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249183#comment-13249183 ] Todd Lipcon commented on HDFS-3229: --- I'd recommend reusing the code/protobufs for the existing getEditLogManifest() calls that the 2NN uses to transfer logs, here. add JournalProtocol RPCs to list finalized edit segments, and read edit segment file from JournalNode. --- Key: HDFS-3229 URL: https://issues.apache.org/jira/browse/HDFS-3229 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Brandon Li Assignee: Brandon Li -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248587#comment-13248587 ] Todd Lipcon commented on HDFS-3212: --- bq. Todd, if you are referring to creating a edit log with the name format edit_log_epoch_numin_progress or when finalized edit_logepoch_numberstart_txidend_txid, it is a better solution that creating a seperate metadata file. Sure, that works too. Except you'll have to change a ton of FileJournalManager code paths to do this... bq. Otherwise, Suresh's solution in adding the epoch number in start log segment sounds good. I still think that's really wrong, because transaction _data_ is separate from transaction _storage_. Epoch numbers are a storage layer thing. bq. Actually, for debugging purposes, we should add more information such as time when the journal was started, NN id of owner etc along with epoch number I agree with all of the above, except for the epoch number. The timestamp, NN id, hostname, etc, are all NN-layer things, whereas the epoch number is an edits storage layer thing. Persist the epoch received by the JournalService Key: HDFS-3212 URL: https://issues.apache.org/jira/browse/HDFS-3212 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: Shared journals (HDFS-3092) Reporter: Suresh Srinivas epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3217) ZKFC should restart NN when healthmonitor gets a SERVICE_NOT_RESPONDING exception
[ https://issues.apache.org/jira/browse/HDFS-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248589#comment-13248589 ] Todd Lipcon commented on HDFS-3217: --- I disagree. It is an explicit decision to not have the ZKFC act as a service supervisor, because it adds a lot of complexity. There already exist lots of solutions for service management - we assume that the user is already using something like puppet, daemontools, supervisord, cron, etc, to make sure the daemon restarts eventually. ZKFC should restart NN when healthmonitor gets a SERVICE_NOT_RESPONDING exception - Key: HDFS-3217 URL: https://issues.apache.org/jira/browse/HDFS-3217 Project: Hadoop HDFS Issue Type: Sub-task Components: auto-failover, ha Reporter: Hari Mankude Assignee: Hari Mankude -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248595#comment-13248595 ] Todd Lipcon commented on HDFS-2983: --- Here's another proposal which I think makes sense: 1) Ensure that version compatibility is checked both on the NN side and the DN side. So, when the DN first connects to the NN, the NN verifies the DN's version. If it is deemed incompatible, it is rejected. Then, then DN verifies the NN version in the response. If it is deemed incompatible, it does not proceed with registration. 2) Add a function to compare two version numbers in the straightforward manner: split the numbers on ., then componentwise, do comparisons according to string numerical value (like sort -n). Some examples: 2.0.1 2.0.0. 10.0 2.0.0. 2.0.0a 2.0.0. 2.0.0b 2.0.0.a. (this is the comparison mechanism package managers tend to use) 3) In hdfs-default.xml, add a configuration like {{cluster.min.supported.version}}. In branch-2, we set this to 2.0.0. So, by default, any 2.x.x can talk to any other 2.x.x. When we release 3.x.x, if it is incompatible with 2.x.x, then we just need to bump that config in 3.0's hdfs-default.xml. This supports the following use cases/requirements: - rolling upgrade can be done for most users without having to change any configs. - new versions of Hadoop can be marked incompatible with old versions of Hadoop - cluster admins can still override it if they want to disallow older nodes from connecting. For example, imagine there is a critical security bug fixed in 2.0.0a - the admin can set the config to 2.0.0.a, and then 2.0.0 nodes may no longer join the cluster (even though they are protocol-wise compatible) Thoughts? Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3222) DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.
[ https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248667#comment-13248667 ] Todd Lipcon commented on HDFS-3222: --- Nice catch, Uma. I think we can use the length field of the block in the NN metadata to solve this, right? The first hflush()/sync() call from the client will cause persistBlocks() to be called, which should write down the block with a non-zero length. Then on restart, we can use this length instead of 0 when the replicas aren't found. DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block. - Key: HDFS-3222 URL: https://issues.apache.org/jira/browse/HDFS-3222 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 1.0.3, 2.0.0, 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G I have seen one situation with Hbase cluster. Scenario is as follows: 1)1.5 blocks has been written and synced. 2)Suddenly cluster has been restarted. Reader opened the file and trying to get the length., By this time partial block contained DNs are not reported to NN. So, locations for this partial block would be 0. In this case, DFSInputStream assumes that, 1 block size as final size. But reader also assuming that, 1 block size is the final length and setting his end marker. Finally reader ending up reading only partial data. Due to this, HMaster could not replay the complete edits. Actually this happend with 20 version. Looking at the code, same should present in trunk as well. {code} int replicaNotFoundCount = locatedblock.getLocations().length; for(DatanodeInfo datanode : locatedblock.getLocations()) { .. .. // Namenode told us about these locations, but none know about the replica // means that we hit the race between pipeline creation start and end. // we require all 3 because some other exception could have happened // on a DN that has it. we want to report that error if (replicaNotFoundCount == 0) { return 0; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248716#comment-13248716 ] Todd Lipcon commented on HDFS-2983: --- Hey Sanjay. I agree that the snapshot-on-upgrade feature is really important, and I don't think this work precludes/breaks that. Here's my line of thinking: - even with the ability to do rolling upgrade, there is no restriction that you _must_ do upgrades like this. So, you could still decide to use the current upgrade process as a policy decision. - As you mentioned, many upgrades/hotfixes/EBFs don't touch core code, so for those, most people would prefer a rolling upgrade without downtime. - separately, after this is committed, we can work on figuring out a strategy that allows you to do an upgrade-style snapshot before starting the rolling upgrade. It looks like you just filed HDFS-3225 for this, so let's continue this discussion there. Agree? Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3203) Currently,The Checkpointer is controled by time.in this way,it must be checkponit in that it is only one transaction in checkpoint period.I think it need add file size t
[ https://issues.apache.org/jira/browse/HDFS-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247068#comment-13247068 ] Todd Lipcon commented on HDFS-3203: --- You can configure dfs.namenode.checkpoint.txns to the desired number of transactoins, and then set dfs.namenode.checkpoint.period to a very high value. This will give you the desired behavior. Does that not satisfy your requirements? Currently,The Checkpointer is controled by time.in this way,it must be checkponit in that it is only one transaction in checkpoint period.I think it need add file size to control checkpoint -- Key: HDFS-3203 URL: https://issues.apache.org/jira/browse/HDFS-3203 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.24.0, 2.0.0 Reporter: liaowenrui -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247071#comment-13247071 ] Todd Lipcon commented on HDFS-3150: --- Sorry, I should have said +1 assuming these changes are addressed in my above comment. Since Eli addressed my comments, here's my official +1 for the patch. Add option for clients to contact DNs via hostname in branch-1 -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt Per the document attached to HADOOP-8198, this is just for branch-1, and unbreaks DN multihoming. The datanode can be configured to listen on a bond, or all interfaces by specifying the wildcard in the dfs.datanode.*.address configuration options, however per HADOOP-6867 only the source address of the registration is exposed to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming. In order to fix it let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP I'd like to go with approach #2 as it does not require making an incompatible change to the client protocol, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) based on the context the ID is being used in, vs always using the IP:xferPort as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3204) Minor modification to JournalProtocol.proto to make it generic
[ https://issues.apache.org/jira/browse/HDFS-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247431#comment-13247431 ] Todd Lipcon commented on HDFS-3204: --- A few small typos: + optional uint32 namespceID = 3;// Namespace ID and here: +// convertion happens for messages from Namenode to Journal receivers. otherwise seems good modulo investigating TestBackupNode failure Minor modification to JournalProtocol.proto to make it generic -- Key: HDFS-3204 URL: https://issues.apache.org/jira/browse/HDFS-3204 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.24.0 Reporter: Suresh Srinivas Attachments: HDFS-3204.txt JournalProtocol.proto uses NamenodeRegistration in methods such as journal() for identifying the source. I want to make it generic so that the method can be called with journal information to identify the journal. I plan to use the protocol also for sync purposes, where the source of the journal can be some thing other than namenode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3203) Currently,The Checkpointer is controled by time.in this way,it must be checkponit in that it is only one transaction in checkpoint period.I think it need add file size t
[ https://issues.apache.org/jira/browse/HDFS-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247441#comment-13247441 ] Todd Lipcon commented on HDFS-3203: --- I'm sorry, I don't understand your question. Can you please clarify? Currently,The Checkpointer is controled by time.in this way,it must be checkponit in that it is only one transaction in checkpoint period.I think it need add file size to control checkpoint -- Key: HDFS-3203 URL: https://issues.apache.org/jira/browse/HDFS-3203 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.24.0, 2.0.0 Reporter: liaowenrui -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3161) 20 Append: Excluded DN replica from recovery should be removed from DN.
[ https://issues.apache.org/jira/browse/HDFS-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247446#comment-13247446 ] Todd Lipcon commented on HDFS-3161: --- If this is 0.20-append specific, we've recently decided to disable the append() call in that branch (and only support sync()). So, I don't think the append-related scenario is worth worrying about (assuming it works correctly in the trunk implementation) 20 Append: Excluded DN replica from recovery should be removed from DN. --- Key: HDFS-3161 URL: https://issues.apache.org/jira/browse/HDFS-3161 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.0 Reporter: suja s Priority: Critical Fix For: 1.0.3 1) DN1-DN2-DN3 are in pipeline. 2) Client killed abruptly 3) one DN has restarted , say DN3 4) In DN3 info.wasRecoveredOnStartup() will be true 5) NN recovery triggered, DN3 skipped from recovery due to above check. 6) Now DN1, DN2 has blocks with generataion stamp 2 and DN3 has older generation stamp say 1 and also DN3 still has this block entry in ongoingCreates 7) as part of recovery file has closed and got only two live replicas ( from DN1 and DN2) 8) So, NN issued the command for replication. Now DN3 also has the replica with newer generation stamp. 9) Now DN3 contains 2 replicas on disk. and one entry in ongoing creates with referring to blocksBeingWritten directory. When we call append/ leaseRecovery, it may again skip this node for that recovery as blockId entry still presents in ongoingCreates with startup recovery true. It may keep continue this dance for evry recovery. And this stale replica will not be cleaned untill we restart the cluster. Actual replica will be trasferred to this node only through replication process. Also unnecessarily that replicated blocks will get invalidated after next recoveries -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247472#comment-13247472 ] Todd Lipcon commented on HDFS-2983: --- Maybe I'm misunderstanding, but I thought the plan for this JIRA was to add a more structured version number with major/minor/patch components. Then have the check still verify that the major/minor match up, but not verify the patch level and svn revision? That is to say, we should loosen the restriction but not entirely drop it. Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Aaron T. Myers Attachments: HDFS-2983.patch Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3211) JournalProtocol changes required for introducing epoch and fencing
[ https://issues.apache.org/jira/browse/HDFS-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247790#comment-13247790 ] Todd Lipcon commented on HDFS-3211: --- Hi Suresh. Have you looked at HDFS-3189? Hopefully we can make our protocols similar with the intent of eventually merging the two implementations. JournalProtocol changes required for introducing epoch and fencing -- Key: HDFS-3211 URL: https://issues.apache.org/jira/browse/HDFS-3211 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: Shared journals (HDFS-3092) Reporter: Suresh Srinivas Assignee: Suresh Srinivas JournalProtocol changes to introduce epoch in every request. Adding new method fence for fencing a JournalService. On BackupNode fence is a no-op. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3211) JournalProtocol changes required for introducing epoch and fencing
[ https://issues.apache.org/jira/browse/HDFS-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247800#comment-13247800 ] Todd Lipcon commented on HDFS-3211: --- Hi Suresh. You need to store the epoch persistently on disk to handle the case of journal daemon restarts, I think. HDFS-3190 does a refactor to add a utility class you can use for this. JournalProtocol changes required for introducing epoch and fencing -- Key: HDFS-3211 URL: https://issues.apache.org/jira/browse/HDFS-3211 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: Shared journals (HDFS-3092) Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: HDFS-3211.txt, HDFS-3211.txt JournalProtocol changes to introduce epoch in every request. Adding new method fence for fencing a JournalService. On BackupNode fence is a no-op. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3110) libhdfs implementation of direct read API
[ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247805#comment-13247805 ] Todd Lipcon commented on HDFS-3110: --- Hey Henry. The patch looks good, but I can't figure out how to run the test. When I do mvn -Pnative -DskipTests install, it builds libhdfs, but doesn't build the hdfs_test binary. Can you post instructions on how to run the test manually? Then we can do another jira to make it more automatic. libhdfs implementation of direct read API - Key: HDFS-3110 URL: https://issues.apache.org/jira/browse/HDFS-3110 Project: Hadoop HDFS Issue Type: Improvement Components: libhdfs Reporter: Henry Robinson Assignee: Henry Robinson Fix For: 0.24.0 Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch, HDFS-3110.2.patch, HDFS-3110.3.patch, HDFS-3110.4.patch, HDFS-3110.5.patch Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, which leads to significant performance increases when reading local data from C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247847#comment-13247847 ] Todd Lipcon commented on HDFS-3212: --- I don't think it's reasonable to put the epoch number inside the START transaction, because that leaks the idea of epochs out of the journal manager layer into the NN layer. Also, if the JN restarts, when it comes up, how do you make sure that an old NN doesn't come back to life with a startLogSegment transaction? I think you need to record the epoch number separately from the idea of segments, for fencing purposes, since you aren't always guaranteed to be in the middle of a segment, and you don't want disagreement about who gets to call startLogSegment. Persist the epoch received by the JournalService Key: HDFS-3212 URL: https://issues.apache.org/jira/browse/HDFS-3212 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: Shared journals (HDFS-3092) Reporter: Suresh Srinivas epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3213) JournalDaemon (server) should persist the cluster id and nsid in the storage directory
[ https://issues.apache.org/jira/browse/HDFS-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247849#comment-13247849 ] Todd Lipcon commented on HDFS-3213: --- I'm assuming you'll use StorageDirectory here, which will take care of this all for you, right? JournalDaemon (server) should persist the cluster id and nsid in the storage directory -- Key: HDFS-3213 URL: https://issues.apache.org/jira/browse/HDFS-3213 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Hari Mankude Assignee: Hari Mankude -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247871#comment-13247871 ] Todd Lipcon commented on HDFS-3212: --- bq. Is it the case that JN will reject it since the old NN has a smaller epoch? Right -- that's why it needs to persist, IMO. bq. 2. might be less optimal because now it consists of 2 operations. 1) rolling the log and creating a new segment 2) updating a metadata file. I think it's just a matter of getting the ordering right. Before starting a log segment, you need to fence prior writers. The fencing step is what writes down the epoch. Then, when you create a new log segment, you tag it (eg by storing it in a directory per-epoch, or by writing a metadata file next to it before you create the file). I think this is sufficiently atomic. bq. So 2 edit logs with same txid but can be differentiated using epochs I've had another idea which I want to write up in the design doc. But, basically, I think we can solve this problem more simply by the following: - Currently, when FSEditLog starts a new segment, it calls journal.startLogSegment(), then journal.logEdit(StartLogSegmentOp), then journal.logSync(). So there is a point of time when the log segment is empty, with no transactions. If instead, we changed it so that the startLogSegment() call was responsible for writing the first transaction (and only the first), atomically, then we might not have a problem. We just have to make the restriction that the first transaction of any segment is always deterministic (eg just START_LOG_SEGMENT(txid) and nothing else). Let me revise the design doc in HDFS-3077 with this idea to see if it works when fully fleshed out. Persist the epoch received by the JournalService Key: HDFS-3212 URL: https://issues.apache.org/jira/browse/HDFS-3212 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: Shared journals (HDFS-3092) Reporter: Suresh Srinivas epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247894#comment-13247894 ] Todd Lipcon commented on HDFS-3212: --- bq. I do not understand what you mean by NN layer. Epoch is a notion from JournalManager to the JournalNode. Both need to understand this and provide appropriate guarantees. Currently, the NN code when starting a new log segment looks like this: {code} editLogStream = journalSet.startLogSegment(segmentTxId); ... if (writeHeaderTxn) { logEdit(LogSegmentOp.getInstance( FSEditLogOpCodes.OP_START_LOG_SEGMENT)); logSync(); } {code} So the operation of starting a segment, and writing the OP_START_LOG_SEGMENT transaction are separate. In general, the JournalManager abstraction doesn't know about the contents of the edits it's writing -- it's just responsible for bytes. If you wanted to include the epoch number in the OP_START_LOG_SEGMENT transaction, you'd have to have the NN code do something like {{journalManager.getCurrentEpoch()}}, and then feed that into the logEdit call. But that's not very generic, so it seems like a leak of abstraction. bq. Whether you store it in a directory per-epoch or record it in the startlogSegment record at the beginning of the segment - they are essentially the same. I agree, if you're talking about prefixing it at the beginning of the file, before the first transaction. But, if you're talking about actually putting it in the content of the first transaction, I think it's a bad idea for the reason above. My preference is to keep it separated from the file, so that the files written by JournalDaemon are exactly identical to the files that would be written by FileJournalManager. That allows you to copy to and from the different types of nodes without any difference in format. Persist the epoch received by the JournalService Key: HDFS-3212 URL: https://issues.apache.org/jira/browse/HDFS-3212 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: Shared journals (HDFS-3092) Reporter: Suresh Srinivas epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3203) Currently,The Checkpointer is controled by time.in this way,it must be checkponit in that it is only one transaction in checkpoint period.I think it need add file size t
[ https://issues.apache.org/jira/browse/HDFS-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248023#comment-13248023 ] Todd Lipcon commented on HDFS-3203: --- bq. 1.dfs.namenode.checkpoint.txns is not used in ha It is supposed to be used. See this code: {code} if (uncheckpointed = checkpointConf.getTxnCount()) { LOG.info(Triggering checkpoint because there have been + uncheckpointed + txns since the last checkpoint, which + exceeds the configured threshold + checkpointConf.getTxnCount()); needCheckpoint = true; } else if (secsSinceLast = checkpointConf.getPeriod()) { LOG.info(Triggering checkpoint because it has been + secsSinceLast + seconds since the last checkpoint, which + exceeds the configured interval + checkpointConf.getPeriod()); needCheckpoint = true; } {code} If it is not working, please explain how to reproduce. bq. 2.why standbyCheckpointer is running in active namenode and standby namenode? The daemon only runs when the node is in standby mode. When it becomes active, it stops the checkpointer. Currently,The Checkpointer is controled by time.in this way,it must be checkponit in that it is only one transaction in checkpoint period.I think it need add file size to control checkpoint -- Key: HDFS-3203 URL: https://issues.apache.org/jira/browse/HDFS-3203 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.24.0, 2.0.0 Reporter: liaowenrui -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3178) Add states for journal synchronization in journal daemon
[ https://issues.apache.org/jira/browse/HDFS-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248057#comment-13248057 ] Todd Lipcon commented on HDFS-3178: --- Hey folks. I noticed there's a branch for HDFS-3092, but this got committed to trunk. Was that on purpose? Add states for journal synchronization in journal daemon Key: HDFS-3178 URL: https://issues.apache.org/jira/browse/HDFS-3178 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 3.0.0 Attachments: h3178_20120403_svn_mv.patch, h3178_20120404.patch, h3178_20120404_svn_mv.patch, h3178_20120404b_svn_mv.patch, h3178_20120405.patch, h3178_20120405_svn_mv.patch, svn_mv.sh Journal in a new daemon has to be synchronized to the current transaction. It requires new states such as WaitingForRoll, Syncing and Synced. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3178) Add states for journal synchronization in journal daemon
[ https://issues.apache.org/jira/browse/HDFS-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248082#comment-13248082 ] Todd Lipcon commented on HDFS-3178: --- No concern, just thought it might have been a mistake. Carry on :) Add states for journal synchronization in journal daemon Key: HDFS-3178 URL: https://issues.apache.org/jira/browse/HDFS-3178 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 3.0.0 Attachments: h3178_20120403_svn_mv.patch, h3178_20120404.patch, h3178_20120404_svn_mv.patch, h3178_20120404b_svn_mv.patch, h3178_20120405.patch, h3178_20120405_svn_mv.patch, svn_mv.sh Journal in a new daemon has to be synchronized to the current transaction. It requires new states such as WaitingForRoll, Syncing and Synced. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3084) FenceMethod.tryFence() and ShellCommandFencer should pass namenodeId as well as host:port
[ https://issues.apache.org/jira/browse/HDFS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246457#comment-13246457 ] Todd Lipcon commented on HDFS-3084: --- No, the scripts and failover controllers use keytab-based or straight user credentials. FenceMethod.tryFence() and ShellCommandFencer should pass namenodeId as well as host:port - Key: HDFS-3084 URL: https://issues.apache.org/jira/browse/HDFS-3084 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 0.24.0, 0.23.3 Reporter: Philip Zeyliger Assignee: Todd Lipcon Attachments: hdfs-3084.txt The FenceMethod interface passes along the host:port of the NN that needs to be fenced. That's great for the common case. However, it's likely necessary to have extra configuration parameters for fencing, and these are typically keyed off the nameserviceId.namenodeId (if, for nothing else, consistency with all the other parameters that are keyed off of namespaceId.namenodeId). Obviously this can be backed out from the host:port, but it's inconvenient, and requires iterating through all the configs. The shell interface exhibits the same issue: host:port is great for most fencers, but if you need extra configs (like the host:port of the power supply unit), those are harder to pipe through without the namenodeId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3168) Clean up FSNamesystem and BlockManager
[ https://issues.apache.org/jira/browse/HDFS-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246600#comment-13246600 ] Todd Lipcon commented on HDFS-3168: --- bq. Aaron, it is nothing to do with it. Any contributor could review code. It was a merging problem. Is that the case? I have no opinion on this particular patch and whether a different reviewer might have seen the issue. But I thought you had to get a committer +1 to commit things... Clean up FSNamesystem and BlockManager -- Key: HDFS-3168 URL: https://issues.apache.org/jira/browse/HDFS-3168 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3168_20120330.patch, h3168_20120402.patch, h3168_20120403.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3168) Clean up FSNamesystem and BlockManager
[ https://issues.apache.org/jira/browse/HDFS-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246651#comment-13246651 ] Todd Lipcon commented on HDFS-3168: --- By my understanding of our policies, the committer who provides the +1 has to be someone separate than the patch author. On branches I'm fine being lax here, since we need three +1s to merge a branch, but on trunk, I think it merits a discussion if there is disagreement on what our policies are. Clean up FSNamesystem and BlockManager -- Key: HDFS-3168 URL: https://issues.apache.org/jira/browse/HDFS-3168 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3168_20120330.patch, h3168_20120402.patch, h3168_20120403.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs
[ https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246671#comment-13246671 ] Todd Lipcon commented on HDFS-3192: --- Why add multiple stonith paths, given we need external stonith anyway? It just adds to the complexity by increasing the number of scenarios we have to debug, etc. That is to say: if the ZKFC dies, then it will lose its lock, and the other node will stonith this one when it takes over. What's the benefit of having it abort itself at the same time? In fact, it seems to be detrimental, because if it stays up, the other node can do a graceful transitionToStandby() call rather than having to do something more drastic like a full abort. Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs -- Key: HDFS-3192 URL: https://issues.apache.org/jira/browse/HDFS-3192 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Hari Mankude Assignee: Hari Mankude -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs
[ https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246718#comment-13246718 ] Todd Lipcon commented on HDFS-3192: --- bq. I thought we are not going to have external stonith using special devices and that is mainly the reason why we are going through hoops to implement fencing in journal daemons. In the current design, which uses a filer, we *require* external stonith devices. There is no correct way of doing it without either stonith or storage fencing. The proposal with the journal-daemon based fencing is essentailly the same as storage fencing - just that we do it with our own software storage instead of a NAS/SAN. bq. Why is the behaviour different from what happens when zkfc loses the ephemeral node? Currently zkfc when it loses the ephemeral node will shutdown the active NN No, it doesn't - it will transition it to standby. But, as I commented elsewhere, this is redundant, because the _new_ active is actually going to fence it anyway before taking over. bq. Similarly if active NN does not hear from zkfc, it implies that zkfc is dead, going through gc pause essentially resulting in loss of ephemeral node. But this can reduce uptime. For example, imagine an administrator accidentally changes the ACL on zookeeper. This causes both ZKFCs to get an authentication error and crash at the same time. With your design, both NNs will then commit suicide. With the existing implementation, the system will continue to run in its existing state -- i.e no new failovers will occur, but whoever is active will remain active. bq. If active NN loses quorum, it has to shutdown Yes, it has to shut down _before_ it does any edits, or it has to be fenced by the next active. Notification of session loss is asynchronous. The same is true of your proposal. In either case it can take arbitrarily long before it notices that it should not be active. So we still require that the new active fence it before it becomes active. So, this proposal doesn't solve any problems. bq. In fact, one of the most of the difficult APIs to implement correctly would be transitionToStandby() from active state. We already have that implemented. It syncs any existing edits, and then stops allowing new ones. We allow failover from one node to another without aborting, so long as it's graceful. This is perfectly correct. If we need to do a non-graceful failover, we fence the node by STONITH or by disallowing further access to the edit logs (which indirectly causes the node to abort, since logSync() fails). It seems you're trying to solve problems we've already solved. Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs -- Key: HDFS-3192 URL: https://issues.apache.org/jira/browse/HDFS-3192 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Hari Mankude Assignee: Hari Mankude -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: HDFS portion of ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246729#comment-13246729 ] Todd Lipcon commented on HDFS-2185: --- Hi Mingjie. Thanks for taking a look. The idea for the chain of RPCs is from talking with some folks here who work on Hadoop deployment. Their opinion was the following: currently, most of the Hadoop client tools are too thick. For example, in the current manual failover implementation, the fencing is run on the admin client. This means that you have to run the haadmin command from a machine that has access to all of the necessary fencing scripts, key files, etc. That's a little bizarre -- you would expect to configure these kinds of things only on the central location, not on the client. So, we decided that it makes sense to push the management of the whole failover process into the FCs themselves, and just use a single RPC to kick off the whole failover process. This keeps the client thin. As for your proposed alternative, here are a few thoughts: bq. existing manual fo code can be kept mostly We actually share much of the code already. But, the problem with using the existing code exactly as is, is that the failover controllers always expect to have complete control over the system. If the state of the NNs changes underneath the ZKFC, then the state in ZK will become inconsistent with the actual state of the system, and it's very easy to get into split brain scenarios. So, the idea is that, when auto-failover is enabled, *all* decisions must be made by ZKFCs. That way we can make sure the ZK state doesn't get out of sync. bq. although new RPC is added to ZKFC but we don't need them to talk to each other. the manual failover logic is all handled at client – haadmin. As noted above I think this is a con, not a pro, because it requires configuring fencing scripts at the client, and likely requiring that the client have read-write access to ZK bq. easier to extend to the case of multiple standby NNs I think the extension path to multiple standby is actually equally easy with both approaches. The solution in the ZKFC-managed implementation is to add a new znode like PreferredActive and have nodes avoid becoming active unless they're listed as preferred. The target node of the failover can just set itself to be preferred before asking the other node to cede the lock. Some other advantages that I probably didn't explain well in the design doc: - this design is fault tolerant. If the target node crashes in the middle of the process, then the old active will automatically regain the active state after its rejoin timeout elapses. With a client-managed setup, a well-meaning admin may ^C the process in the middle and leave the system with no active at all. - no need to introduce disable/enable to auto-failover. Just having both nodes quit the election wouldn't work, since one would end up quitting before the other, causing a blip where an unnecessary (random) failover occurred. We could carefully orchestrate the order of quitting, so the active quits last, but I think it still gets complicated. HA: HDFS portion of ZK-based FailoverController --- Key: HDFS-2185 URL: https://issues.apache.org/jira/browse/HDFS-2185 Project: Hadoop HDFS Issue Type: Sub-task Components: auto-failover, ha Affects Versions: 0.24.0, 0.23.3 Reporter: Eli Collins Assignee: Todd Lipcon Fix For: Auto failover (HDFS-3042) Attachments: Failover_Controller.jpg, hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.tex This jira is for a ZK-based FailoverController daemon. The FailoverController is a separate daemon from the NN that does the following: * Initiates leader election (via ZK) when necessary * Performs health monitoring (aka failure detection) * Performs fail-over (standby to active and active to standby transitions) * Heartbeats to ensure the liveness It should have the same/similar interface as the Linux HA RM to aid pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3178) Add states for journal synchronization in journal daemon
[ https://issues.apache.org/jira/browse/HDFS-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246731#comment-13246731 ] Todd Lipcon commented on HDFS-3178: --- Hi Nicholas. Could you please add some javadoc to the state enum values explaining the purpose of each state, and what the transitions are between them? Or augment the design doc for HDFS-3092 with this state machine, and reference it from the code? Add states for journal synchronization in journal daemon Key: HDFS-3178 URL: https://issues.apache.org/jira/browse/HDFS-3178 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3178_20120403_svn_mv.patch, h3178_20120404.patch, h3178_20120404_svn_mv.patch, svn_mv.sh Journal in a new daemon has to be synchronized to the current transaction. It requires new states such as WaitingForRoll, Syncing and Synced. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3168) Clean up FSNamesystem and BlockManager
[ https://issues.apache.org/jira/browse/HDFS-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246734#comment-13246734 ] Todd Lipcon commented on HDFS-3168: --- Let's ask the dev list. I'll start a thread. Clean up FSNamesystem and BlockManager -- Key: HDFS-3168 URL: https://issues.apache.org/jira/browse/HDFS-3168 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3168_20120330.patch, h3168_20120402.patch, h3168_20120403.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira