[jira] [Updated] (HDFS-3637) Add support for encrypting the DataTransferProtocol
[ https://issues.apache.org/jira/browse/HDFS-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3637: - Attachment: HDFS-3637.patch Updated patch addressing Eli's feedback. > Add support for encrypting the DataTransferProtocol > --- > > Key: HDFS-3637 > URL: https://issues.apache.org/jira/browse/HDFS-3637 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client, security >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3637.patch, HDFS-3637.patch, HDFS-3637.patch, > HDFS-3637.patch > > > Currently all HDFS RPCs performed by NNs/DNs/clients can be optionally > encrypted. However, actual data read or written between DNs and clients (or > DNs to DNs) is sent in the clear. When processing sensitive data on a shared > cluster, confidentiality of the data read/written from/to HDFS may be desired. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3637) Add support for encrypting the DataTransferProtocol
[ https://issues.apache.org/jira/browse/HDFS-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429961#comment-13429961 ] Aaron T. Myers commented on HDFS-3637: -- Thanks a lot for the very thorough review, Eli. Updated patch incoming. bq. Testing? In addition to the included automated tests, I've tested this on a 4-node cluster, reading and writing files, running MR jobs (tera gens/sorts), etc. I've seen no issues. bq. What's the latest performance slowdown for the basic HDFS read/write path with RC4 enabled? I haven't done a really thorough benchmark, but my testing indicates about a 1.8-2.2x slowdown with RC4, and a much higher slowdown with 3DES. I think this description of the relative speed of cipher algorithms in Java is pretty accurate: http://www.javamex.com/tutorials/cryptography/ciphers.shtml bq. Seems like DFSOutputStream#newBlockReader in the conf.useLegacyBlockReader conditional should use a precondition or throw an RTE (eg AssertionError) if encryptionKey is null, otherwise the client will just consider this a dead DN and keep trying. Good point. Changed to a RuntimeException. bq. In the other case it should blow up if encryptionKey is null right, otherwise we can have it enabled server side but allow a client not to use it? Not quite sure what you mean by this. In which case should we blow up if encryptionKey is null? Note that the client will never be allowed to not use encryption if the DN is configured to use it. The error message won't be nice, but no data will ever be transmitted in the clear. bq. The dfs.encrypt.data.transfer description that this is a server-side config Done. bq. Add dfs.encrypt.data.transfer.algorithm with out a default and list two supported values? Added the following: {code} dfs.encrypt.data.transfer.algorithm This value may be set to either "3des" or "rc4". If nothing is set, then the configured JCE default on the system is used (usually 3DES.) It is widely believed that 3DES is more cryptographically secure, but RC4 is substantially faster. {code} bq. Shouldn't shouldEncryptData throw an exception if server defaults is null instead of assume it shouldn't encrypt? Seems more secure, eg if we ever introduce a bug that results in the NN returning a null server default (should never happen currently). No, for compatibility purposes. With the current implementation, an upgraded client talking to an older server (without encryption support) will correctly conclude that it does not need to encrypt data. Again, if we ever were to introduce a bug like you describe, nothing would be sent in the clear, and the client would blow up eventually. bq. Consider pulling out the block manager not setting the block pool ID bug to a separate change? Sorry, it's not a bug. It's because I changed BlockTokenSecretManager to take the BlockPoolId at creation time, instead of every time a BlockToken is created. This is a reasonable change to make since a single BlockTokenSecretManager cannot actually issue valid BlockTokens for anything but a single BlockPoolId. Sorry, I should have mentioned this change in my description of the patch. bq. Use DFS_BLOCK_ACCESS_TOKEN_LIFETIME_DEFAULT instead of 15s? This wouldn't be right, since we've lowered the key update interval and token lifetime earlier in the test. It also needs to be a few multiples of the block token lifetime, since several block tokens are valid at any given time (the current and the last two, by default.) bq. Also perhaps update the relevant NN java doc to indicate that "getting" the key generates a new key with this timeout. I called it "getEncryptionKey" to be in keeping with "getDelegationToken". More appropriate for these would probably be "generate" instead of "get". What are your thoughts on this? bq. Jira for supporting encryption or remove this TODO? Well, since we're sort of phasing out support for RemoteBlockReader, I doubt such a JIRA will actually ever be implemented. Perhaps we should just remove the TODO? bq. Are the sendReadResult write timeout and DFSOutputStream#flush a separate issue or something introduced here? It's no functional change - just a refactor so that RemoteBlockReader2#writeReadResult takes a stream as an argument, instead of always creating a new stream from the given socket. > Add support for encrypting the DataTransferProtocol > --- > > Key: HDFS-3637 > URL: https://issues.apache.org/jira/browse/HDFS-3637 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client, security >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3637.patch, HDFS-3637.patch, HDFS-3637.patch > > > Curre
[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429943#comment-13429943 ] Hadoop QA commented on HDFS-3672: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539391/hdfs-3672-6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestFileConcurrentReader +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2961//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2961//console This message is automatically generated. > Expose disk-location information for blocks to enable better scheduling > --- > > Key: HDFS-3672 > URL: https://issues.apache.org/jira/browse/HDFS-3672 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, > hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch > > > Currently, HDFS exposes on which datanodes a block resides, which allows > clients to make scheduling decisions for locality and load balancing. > Extending this to also expose on which disk on a datanode a block resides > would enable even better scheduling, on a per-disk rather than coarse > per-datanode basis. > This API would likely look similar to Filesystem#getFileBlockLocations, but > also involve a series of RPCs to the responsible datanodes to determine disk > ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429942#comment-13429942 ] Aaron T. Myers commented on HDFS-3672: -- Breaking up DFSClient#getDiskBlockLocations makes the code a lot more readable IMO. Thanks for doing that. A few more comments: # This exception message shouldn't include "getDiskBlockLocations". I recommend you just say "DFSClient#getDiskBlockLocations expected to be given instances of HdfsBlockLocation" # In the "re-group the locatedblocks to be grouped by datanodes..." loop, it seems like instead of the {{if (...)}} check, you could just put the initialization of the LocatedBlock list inside the outer loop, before the inner loop. # Rather than using a hard-coded 10 threads for the ThreadPoolExecutor, please make this configurable. I think it's reasonable to not document it in a *-default.xml file, since most users will never want to change this value, but if someone does find the need to do it it'd be nice to not have to recompile. # Rather than reusing the socket read timeout as the timeout for the RPCs to the DNs, I think this should be separately configurable. That conf value is used as the timeout for reading block data from a DN, and defaults to 60s. I think it's entirely reasonable that callers of this API will want a much lower timeout. For that matter, you might consider calling the version of ScheduledThreadPoolExecutor#invokeAll that takes a timeout as a parameter. # You should add a comment explaining the reasoning for having this loop. (I see why it is, but it's not obvious, so should be explained.) {code} +for (int i = 0; i < futures.size(); i++) { + metadatas.add(null); +} {code} # In the final loop in DFSClient#queryDatanodesForHdfsBlocksMetadata, I recommend you move the fetching of the callable and the datanode objects to the catch clause, since that's the only place those variables are used. # In the same catch clause mentioned above, I recommend you log the full exception stack trace if LOG.isDebugEnabled(). # "did not" should be two words: {code} +LOG.debug("Datanode responded with a block disk id we did" + +"not request, omitting."); {code} # I think we should make it clear in the HdfsDiskId javadoc that it only uniquely identifies a data directory on a DN _when paired with that DN._ i.e. it is not the case that DiskId is unique between DNs. # You shouldn't be using protobuf ByteString outside of the protobuf translator code - just use a byte[]. For that matter, it's only necessary that the final result to clients of the API be an opaque identifier. In the DN-side implementation of the RPC, and even the DFSClient code, you could reasonably use a meaningful value that's not opaque. # How could this possibly happen? {code} +// Oddly, we got a blockpath that didn't match any dataDir. +if (diskIndex == dataDirs.size()) { + LOG.warn("Could not determine the data dir of block " + + block.toString() + " with path " + blockPath); +} {code} > Expose disk-location information for blocks to enable better scheduling > --- > > Key: HDFS-3672 > URL: https://issues.apache.org/jira/browse/HDFS-3672 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, > hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch > > > Currently, HDFS exposes on which datanodes a block resides, which allows > clients to make scheduling decisions for locality and load balancing. > Extending this to also expose on which disk on a datanode a block resides > would enable even better scheduling, on a per-disk rather than coarse > per-datanode basis. > This API would likely look similar to Filesystem#getFileBlockLocations, but > also involve a series of RPCs to the responsible datanodes to determine disk > ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3634) Add self-contained, mavenized fuse_dfs test
[ https://issues.apache.org/jira/browse/HDFS-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429928#comment-13429928 ] Hadoop QA commented on HDFS-3634: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539392/HDFS-3634.004.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2960//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2960//console This message is automatically generated. > Add self-contained, mavenized fuse_dfs test > --- > > Key: HDFS-3634 > URL: https://issues.apache.org/jira/browse/HDFS-3634 > Project: Hadoop HDFS > Issue Type: Test > Components: fuse-dfs >Affects Versions: 2.1.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3634.002.patch, HDFS-3634.003.patch, > HDFS-3634.004.patch > > > We should have a self-contained, mavenized FUSE unit test which runs as part > of the normal build and can detect problems. Of course, because FUSE is an > optional build component, the unit test won't run unless the user has FUSE > installed. However, it would be very useful in improving the quality of > fuse_dfs and detecting regressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2554) Add separate metrics for missing blocks with desired replication level 1
[ https://issues.apache.org/jira/browse/HDFS-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429917#comment-13429917 ] Eli Collins commented on HDFS-2554: --- Andy, definitely a good problem to solve. Seems like the critical metrics for users are: 1. Num blocks where all available replicas are corrupt ie "Any in-accessible block is bad". This is r>=1, n=0, c>=r (differs from "CorruptBlocksRN" in that r>=1 and c>=r). 2. Ditto but r=1 ie "I'm OK with this, but the corresponding files need to be cleaned up". This is r=1, n=0, c>=1 (differs from "CorruptBlocksR1" in that c>=1). 3. Ditto but r>1 ie "Yikes, somehow all replicas are corrupt, this is bad". This is r>1, n=0, c>=r (differs from "CorruptBlocksRN" in that c>=r) 4. Num blocks where no replicas are live and there are no known corrupt replicas, ie "Yikes, all the DNs hosting these blocks are not available for some reason". This is r>=1, n=0, c=0 (differs from "MissingBlocksRN" in that r>=1). 5. Ditto but r=1 ie "I'm OK with this, but I need to get the relevant DN back on line". This is r=1, n=0, c=0, ie "MissingBlocksR1". Note that a replica may not be considered live because its DN is decommissioning. 6. Ditto but r>1 ie "Yikes, somehow all DNs hosting the block are offline, this is bad". This is r>1, n=0, c=0 ie "MissingBlocksRN". Since you can compute 3 and 6 by subtracting the previous two we technically only need to track the others. Also, I'm slightly altering your definition of "n" here, ie I'm considering it "live" replicas, which doesn't include a decommissioning replica which you might be considering "good" since it's a valid replica. Thoughts? > Add separate metrics for missing blocks with desired replication level 1 > > > Key: HDFS-2554 > URL: https://issues.apache.org/jira/browse/HDFS-2554 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Andy Isaacson >Priority: Minor > > Some users use replication level set to 1 for datasets which are unimportant > and can be lost with no worry (eg the output of terasort tests). But other > data on the cluster is important and should not be lost. It would be useful > to separate the metric for missing blocks by the desired replication level of > those blocks, so that one could ignore missing blocks at repl 1 while still > alerting on missing blocks with higher desired replication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages
[ https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429912#comment-13429912 ] Vinay commented on HDFS-3765: - Thanks a lot Todd for taking a look. {quote} I'm not 100% convinced the "copy from one edits storage to another" should be lumped in with "initializeSharedEdits"{quote} If you feel we can handle this in separate jira, then fine. I will concentrate only on the genericizing part. {quote}Also, please add a test which uses this new facility to initialize BKJM edits, if you don't mind.{quote} Sure, I will try to add a testcase in BKJM contrib module. > Namenode INITIALIZESHAREDEDITS should be able to initialize all shared > storages > --- > > Key: HDFS-3765 > URL: https://issues.apache.org/jira/browse/HDFS-3765 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha >Affects Versions: 2.1.0-alpha, 3.0.0 >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-3765.patch > > > Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits > files to file schema based shared storages when moving cluster from Non-HA > environment to HA enabled environment. > This Jira focuses on the following > * Generalizing the logic of copying the edits to new shared storage so that > any schema based shared storage can initialized for HA cluster. > * Ability to Initialize new shared storage from existing shared storage when > moving from One shared storage to another shared storage (Might be because of > cost, performance, etc. For ex: Moving from NFS to BKJM/QJM). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3769) standby namenode become ative fail ,because starting log segment fail on share strage
[ https://issues.apache.org/jira/browse/HDFS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429687#comment-13429687 ] liaowenrui commented on HDFS-3769: -- when 2354 editlog only write in local disk,and Active NN is restarted,and then it become active agian,and write 2355 editlog in share store.if 2354 editlog write to share store fail,and this fs op fail too, 2354 editlog is not avaliable log. but this scenario will lead to standby nn become active fail. my modify: I will add streams = editLog.selectInputStreams(lastTxnId + 2, 0, null, false); code in doTailEdits function. private void doTailEdits() throws IOException, InterruptedException { // Write lock needs to be interruptible here because the // transitionToActive RPC takes the write lock before calling // tailer.stop() -- so if we're not interruptible, it will // deadlock. namesystem.writeLockInterruptibly(); try { FSImage image = namesystem.getFSImage(); long lastTxnId = image.getLastAppliedTxId(); if (LOG.isDebugEnabled()) { LOG.debug("lastTxnId: " + lastTxnId); } Collection streams; try { streams = editLog.selectInputStreams(lastTxnId + 1, 0, null, false); } catch (IOException ioe) { try { streams = editLog.selectInputStreams(lastTxnId + 2, 0, null, false); }catch(IOException ioe1) { // This is acceptable. If we try to tail edits in the middle of an edits // log roll, i.e. the last one has been finalized but the new inprogress // edits file hasn't been started yet. LOG.warn("Edits tailer failed to find any streams. Will try again " + "later.", ioe); return; } } if (LOG.isDebugEnabled()) { LOG.debug("edit streams to load from: " + streams.size()); } // Once we have streams to load, errors encountered are legitimate cause // for concern, so we don't catch them here. Simple errors reading from // disk are ignored. long editsLoaded = 0; try { editsLoaded = image.loadEdits(streams, namesystem, null); } catch (EditLogInputException elie) { editsLoaded = elie.getNumEditsLoaded(); throw elie; } finally { if (editsLoaded > 0 || LOG.isDebugEnabled()) { LOG.info(String.format("Loaded %d edits starting from txid %d ", editsLoaded, lastTxnId)); } } if (editsLoaded > 0) { lastLoadTimestamp = now(); } lastLoadedTxnId = image.getLastAppliedTxId(); } finally { namesystem.writeUnlock(); } } > standby namenode become ative fail ,because starting log segment fail on > share strage > - > > Key: HDFS-3769 > URL: https://issues.apache.org/jira/browse/HDFS-3769 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.0.0-alpha > Environment: 3 datanode:158.1.132.18,158.1.132.19,160.161.0.143 > 2 namenode:158.1.131.18,158.1.132.19 > 3 zk:158.1.132.18,158.1.132.19,160.161.0.143 > 3 bookkeeper:158.1.132.18,158.1.132.19,160.161.0.143 > ensemble-size:2,quorum-size:2 >Reporter: liaowenrui >Priority: Critical > Fix For: 2.1.0-alpha, 2.0.1-alpha > > > 2012-08-06 15:09:46,264 ERROR > org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper: Node > /ledgers/available already exists and this is not a retry > 2012-08-06 15:09:46,264 INFO > org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager: Successfully > created bookie available path : /ledgers/available > 2012-08-06 15:09:46,273 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in > /opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current > 2012-08-06 15:09:46,277 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest > edits from old active before taking over writer role in edits logs. > 2012-08-06 15:09:46,363 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication > and invalidation queues... > 2012-08-06 15:09:46,363 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all > datandoes as stale > 2012-08-06 15:09:46,383 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of > blocks= 239 > 2012-08-06 15:09:46,383 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid > blocks = 0 > 2012-08-06 15:09:46,383 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of > under-replicated blocks = 0 > 2012-08-06 15:09:46,383 INFO > org.apache.hadoop.hdf
[jira] [Commented] (HDFS-3769) standby namenode become ative fail ,because starting log segment fail on share strage
[ https://issues.apache.org/jira/browse/HDFS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429684#comment-13429684 ] liaowenrui commented on HDFS-3769: -- Active NN editlog: 158-1-132-18:/opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current # ll edits_000235 edits_0002354-0002354 edits_0002355-0002356 edits_0002357-0002358 edits_0002359-0002360 Active NN fsimage file: -rw-r--r-- 1 root root 37545 Aug 6 07:44 fsimage_0002351 -rw-r--r-- 1 root root 62 Aug 6 07:46 fsimage_0002351.md5 -rw-r--r-- 1 root root 37545 Aug 6 09:33 fsimage_0002353 -rw-r--r-- 1 root root 62 Aug 6 09:33 fsimage_0002353.md5 Standby NN editlog: 158-1-132-19:/opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current # ll edits_000235 edits_0002350-0002351 edits_0002352-0002353 Standby NN fsimage file: -rw-r--r-- 1 root root 37545 Aug 6 11:51 fsimage_0002351 -rw-r--r-- 1 root root 62 Aug 6 11:51 fsimage_0002351.md5 -rw-r--r-- 1 root root 37545 Aug 6 13:38 fsimage_0002353 -rw-r--r-- 1 root root 62 Aug 6 13:38 fsimage_0002353.md5 -rw-r--r-- 1 root root 5 Aug 6 11:47 seen_txid share storage editlog: [zk: localhost:2181(CONNECTED) 3] ls /hdfsEdit/ledgers/edits_00235 edits_002352_002353 edits_002357_002358 edits_002355_002356 edits_002350_002351 edits_002359_002360 [zk: localhost:2181(CONNECTED) 2] get /hdfsEdit/maxtxid 2360 cZxid = 0x3002d ctime = Mon Jul 30 05:25:32 EDT 2012 mZxid = 0xb0860 mtime = Mon Aug 06 15:09:36 EDT 2012 pZxid = 0x3002d cversion = 0 dataVersion = 681 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 4 numChildren = 0 we can find edits_0002354-0002354 file only in active nn. when standby nn become active,and load 2354 editlog,but 2354<2360(maxtxid),and then Standby NN throw excption,and shutdown. > standby namenode become ative fail ,because starting log segment fail on > share strage > - > > Key: HDFS-3769 > URL: https://issues.apache.org/jira/browse/HDFS-3769 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.0.0-alpha > Environment: 3 datanode:158.1.132.18,158.1.132.19,160.161.0.143 > 2 namenode:158.1.131.18,158.1.132.19 > 3 zk:158.1.132.18,158.1.132.19,160.161.0.143 > 3 bookkeeper:158.1.132.18,158.1.132.19,160.161.0.143 > ensemble-size:2,quorum-size:2 >Reporter: liaowenrui >Priority: Critical > Fix For: 2.1.0-alpha, 2.0.1-alpha > > > 2012-08-06 15:09:46,264 ERROR > org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper: Node > /ledgers/available already exists and this is not a retry > 2012-08-06 15:09:46,264 INFO > org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager: Successfully > created bookie available path : /ledgers/available > 2012-08-06 15:09:46,273 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in > /opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current > 2012-08-06 15:09:46,277 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest > edits from old active before taking over writer role in edits logs. > 2012-08-06 15:09:46,363 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication > and invalidation queues... > 2012-08-06 15:09:46,363 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all > datandoes as stale > 2012-08-06 15:09:46,383 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of > blocks= 239 > 2012-08-06 15:09:46,383 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid > blocks = 0 > 2012-08-06 15:09:46,383 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of > under-replicated blocks = 0 > 2012-08-06 15:09:46,383 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of > over-replicated blocks = 0 > 2012-08-06 15:09:46,383 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks > being written= 0 > 2012-08-06 15:09:46,383 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing > edit logs at txnid 2354 > 2012-08-06 15:09:46,471 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 2354 > 2
[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429680#comment-13429680 ] Eli Collins commented on HDFS-3754: --- Yea, was looking at that. Don't think it's related filed HDFS-3770 with the rationale. Sanity checked that this test passes for me w/ this patch applied for a couple of runs. > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 1.2.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754-b1.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3770) TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed
[ https://issues.apache.org/jira/browse/HDFS-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429679#comment-13429679 ] Eli Collins commented on HDFS-3770: --- Here's the relevant portion of the log: Exception in thread "Thread-2125" java.lang.RuntimeException: org.apache.hadoop.fs.ChecksumException: Checksum error: /block-being-written-to at 1072128 exp: 1082174632 got: -132500175 at org.apache.hadoop.hdfs.TestFileConcurrentReader$4.run(TestFileConcurrentReader.java:383) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: /block-being-written-to at 1072128 exp: 1082174632 got: -132500175 at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:297) at org.apache.hadoop.hdfs.RemoteBlockReader2.verifyPacketChecksums(RemoteBlockReader2.java:221) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:191) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:130) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:526) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:578) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:632) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:673) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.hdfs.TestFileConcurrentReader.tailFile(TestFileConcurrentReader.java:440) at org.apache.hadoop.hdfs.TestFileConcurrentReader.access$200(TestFileConcurrentReader.java:54) at org.apache.hadoop.hdfs.TestFileConcurrentReader$4.run(TestFileConcurrentReader.java:379) ... 1 more Exception in thread "Thread-2124" java.lang.RuntimeException: java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline at org.apache.hadoop.hdfs.TestFileConcurrentReader$3.run(TestFileConcurrentReader.java:367) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:1649) at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:1633) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1718) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:99) at org.apache.hadoop.hdfs.TestFileConcurrentReader$3.run(TestFileConcurrentReader.java:363) And this as well.. 2012-08-06 23:38:14,373 INFO hdfs.StateChange (FSNamesystem.java:reportBadBlocks(4727)) - *DIR* NameNode.reportBadBlocks 2012-08-06 23:38:14,374 INFO hdfs.StateChange (CorruptReplicasMap.java:addToCorruptReplicasMap(66)) - BLOCK NameSystem.addToCorruptReplicasMap: blk_4844811661965065785 added as corrupt on 127.0.0.1:33823 by /127.0.0.1 because client machine reported it 2012-08-06 23:38:14,375 ERROR hdfs.TestFileConcurrentReader (TestFileConcurrentReader.java:run(381)) - error tailing file /block-being-written-to org.apache.hadoop.fs.ChecksumException: Checksum error: /block-being-written-to at 1072128 exp: 1082174632 got: -132500175 at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:297) at org.apache.hadoop.hdfs.RemoteBlockReader2.verifyPacketChecksums(RemoteBlockReader2.java:221) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:191) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:130) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:526) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:578) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:632) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:673) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.hdfs.TestFileConcurrentReader.tailFile(TestFileConcurrentReader.java:440) at org.apache.hadoop.hdfs.TestFileConcurrentReader.access$200(TestFileConcurrentReader.java:54) at org.apache.hadoop.hdfs.TestFileConcurrentReader$4.run(TestFileConcurrentReader.java:379) at java.lang.Thread.run(Thread.java:662) 2012-08-06 23:38:14,376 ERROR hdfs.TestFileConcurrentReader (TestFileConcurrentReader.java:run(393)) - error in tailer java.lang.RuntimeException: org.apache.hadoop.fs.ChecksumException: Checksum error: /block-being-written-to at 1072128 exp: 1082174632 got: -132500175 at org.apach
[jira] [Created] (HDFS-3770) TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed
Eli Collins created HDFS-3770: - Summary: TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed Key: HDFS-3770 URL: https://issues.apache.org/jira/browse/HDFS-3770 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Eli Collins TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed on [a recent job|https://builds.apache.org/job/PreCommit-HDFS-Build/2959]. Looks like a race in the test. The failure is due to a ChecksumException but that's likely due to the DFSOutputstream getting interrupted on close. Looking at the relevant code, waitForAckedSeqno is getting an InterruptedException waiting on dataQueue, looks like there are uses of interrupt where we're not first notifying dataQueue, or waiting for the notifications to be delivered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3769) standby namenode become ative fail ,because starting log segment fail on share strage
liaowenrui created HDFS-3769: Summary: standby namenode become ative fail ,because starting log segment fail on share strage Key: HDFS-3769 URL: https://issues.apache.org/jira/browse/HDFS-3769 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.0.0-alpha Environment: 3 datanode:158.1.132.18,158.1.132.19,160.161.0.143 2 namenode:158.1.131.18,158.1.132.19 3 zk:158.1.132.18,158.1.132.19,160.161.0.143 3 bookkeeper:158.1.132.18,158.1.132.19,160.161.0.143 ensemble-size:2,quorum-size:2 Reporter: liaowenrui Priority: Critical Fix For: 2.1.0-alpha, 2.0.1-alpha 2012-08-06 15:09:46,264 ERROR org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper: Node /ledgers/available already exists and this is not a retry 2012-08-06 15:09:46,264 INFO org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager: Successfully created bookie available path : /ledgers/available 2012-08-06 15:09:46,273 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current 2012-08-06 15:09:46,277 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest edits from old active before taking over writer role in edits logs. 2012-08-06 15:09:46,363 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication and invalidation queues... 2012-08-06 15:09:46,363 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all datandoes as stale 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of blocks= 239 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid blocks = 0 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of under-replicated blocks = 0 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of over-replicated blocks = 0 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks being written= 0 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 2354 2012-08-06 15:09:46,471 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 2354 2012-08-06 15:09:46,472 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: starting log segment 2354 failed for required journal (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@4eda1515, stream=null)) java.io.IOException: We've already seen 2354. A new stream cannot be created with it at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.startLogSegment(BookKeeperJournalManager.java:297) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:86) at org.apache.hadoop.hdfs.server.namenode.JournalSet$2.apply(JournalSet.java:182) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:319) at org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:179) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:894) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:268) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:618) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1322) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1230) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:990) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact
[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429665#comment-13429665 ] Aaron T. Myers commented on HDFS-3754: -- The specific test case which failed was recently re-enabled after having been disabled for a very long time. Quite possible the failure is unrelated to this particular patch, but worth looking in to. > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 1.2.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754-b1.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3634) Add self-contained, mavenized fuse_dfs test
[ https://issues.apache.org/jira/browse/HDFS-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429662#comment-13429662 ] Andy Isaacson commented on HDFS-3634: - {code} +req.tv_sec = rem.tv_sec; +req.tv_nsec = rem.tv_nsec; {code} This can simply be written {{req = rem;}}. {code} + } while (rem.tv_sec || rem.tv_nsec); {code} I don't see anywhere in the docs that say {{rem}} is zeroed on successful sleep, nor that it isn't modified on successful sleep. The docs say the return value will be 0 on successful sleep. So we should do something like {code} do { req = rem; ret = nanosleep(&req, &rem); } while (ret == -1 && errno == EINTR); if (ret == -1) { fprintf(stderr, "sleepNoSig: nanosleep: %s\n", strerror(errno)); } {code} > Add self-contained, mavenized fuse_dfs test > --- > > Key: HDFS-3634 > URL: https://issues.apache.org/jira/browse/HDFS-3634 > Project: Hadoop HDFS > Issue Type: Test > Components: fuse-dfs >Affects Versions: 2.1.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3634.002.patch, HDFS-3634.003.patch, > HDFS-3634.004.patch > > > We should have a self-contained, mavenized FUSE unit test which runs as part > of the normal build and can detect problems. Of course, because FUSE is an > optional build component, the unit test won't run unless the user has FUSE > installed. However, it would be very useful in improving the quality of > fuse_dfs and detecting regressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3768) Exception in TestJettyHelper is incorrect
Jakob Homan created HDFS-3768: - Summary: Exception in TestJettyHelper is incorrect Key: HDFS-3768 URL: https://issues.apache.org/jira/browse/HDFS-3768 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jakob Homan Assignee: Eli Reisman Priority: Minor hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/test/TestJettyHelper.java:80 {noformat} throw new RuntimeException("Could not stop embedded servlet container, " + ex.getMessage(), ex); {noformat} This is being thrown from createJettyServer and was copied and pasted from stop. Should say we can't start the servlet container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages
[ https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429651#comment-13429651 ] Todd Lipcon commented on HDFS-3765: --- Code looks pretty reasonable. But I think we should separate this into two separate patches. I'm not 100% convinced the "copy from one edits storage to another" should be lumped in with "initializeSharedEdits". Would you mind doing just the genericizing part in this JIRA and we can discuss the other use case separately? Also, please add a test which uses this new facility to initialize BKJM edits, if you don't mind. > Namenode INITIALIZESHAREDEDITS should be able to initialize all shared > storages > --- > > Key: HDFS-3765 > URL: https://issues.apache.org/jira/browse/HDFS-3765 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha >Affects Versions: 2.1.0-alpha, 3.0.0 >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-3765.patch > > > Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits > files to file schema based shared storages when moving cluster from Non-HA > environment to HA enabled environment. > This Jira focuses on the following > * Generalizing the logic of copying the edits to new shared storage so that > any schema based shared storage can initialized for HA cluster. > * Ability to Initialize new shared storage from existing shared storage when > moving from One shared storage to another shared storage (Might be because of > cost, performance, etc. For ex: Moving from NFS to BKJM/QJM). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3637) Add support for encrypting the DataTransferProtocol
[ https://issues.apache.org/jira/browse/HDFS-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429649#comment-13429649 ] Eli Collins commented on HDFS-3637: --- ATM, Overall design and implementation looks great - nice work. Testing? What's the latest performance slowdown for the basic HDFS read/write path with RC4 enabled? BlockReaderFactory - Seems like DFSOutputStream#newBlockReader in the conf.useLegacyBlockReader conditional should use a precondition or throw an RTE (eg AssertionError) if encryptionKey is null, otherwise the client will just consider this a dead DN and keep trying. - In the other case it should blow up if encryptionKey is null right, otherwise we can have it enabled server side but allow a client not to use it? hdfs-default.xml - The dfs.encrypt.data.transfer description that this is a server-side config - Add dfs.encrypt.data.transfer.algorithm with out a default and list two supported values? DataTransferEncryptor - What are the main HDFS-specific tweaks/delta from TSaslTransport? DFSClient - Shouldn't shouldEncryptData throw an exception if server defaults is null instead of assume it shouldn't encrypt? Seems more secure, eg if we ever introduce a bug that results in the NN returning a null server default (should never happen currently). FSN - Consider pulling out the block manager not setting the block pool ID bug to a separate change? TestEncryptedTransfer - Use DFS_BLOCK_ACCESS_TOKEN_LIFETIME_DEFAULT instead of 15s? Also perhaps update the relevant NN java doc to indicate that "getting" the key generates a new key with this timeout. RemoteBlockReader - Jira for supporting encryption or remove this TODO? - Are the sendReadResult write timeout and DFSOutputStream#flush a separate issue or something introduced here? > Add support for encrypting the DataTransferProtocol > --- > > Key: HDFS-3637 > URL: https://issues.apache.org/jira/browse/HDFS-3637 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client, security >Affects Versions: 2.0.0-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-3637.patch, HDFS-3637.patch, HDFS-3637.patch > > > Currently all HDFS RPCs performed by NNs/DNs/clients can be optionally > encrypted. However, actual data read or written between DNs and clients (or > DNs to DNs) is sent in the clear. When processing sensitive data on a shared > cluster, confidentiality of the data read/written from/to HDFS may be desired. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429640#comment-13429640 ] Hadoop QA commented on HDFS-3754: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539370/hdfs-3754.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestFileConcurrentReader +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2959//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2959//console This message is automatically generated. > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 1.2.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754-b1.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3754: -- Attachment: hdfs-3754-b1.txt Thanks, here's the patch for branch-1. > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 1.2.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754-b1.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3634) Add self-contained, mavenized fuse_dfs test
[ https://issues.apache.org/jira/browse/HDFS-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3634: --- Attachment: HDFS-3634.004.patch * fix a few style issues * use getmntent rather than reading from /proc/mounts directly. This also means we don't need the code to parse octal escapes. * don't call recursiveDeleteContents on mount point before mounting: instead, give the -ononempty option to FUSE. * sleepNoSig: sleep the full period even in the presence of signals > Add self-contained, mavenized fuse_dfs test > --- > > Key: HDFS-3634 > URL: https://issues.apache.org/jira/browse/HDFS-3634 > Project: Hadoop HDFS > Issue Type: Test > Components: fuse-dfs >Affects Versions: 2.1.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3634.002.patch, HDFS-3634.003.patch, > HDFS-3634.004.patch > > > We should have a self-contained, mavenized FUSE unit test which runs as part > of the normal build and can detect problems. Of course, because FUSE is an > optional build component, the unit test won't run unless the user has FUSE > installed. However, it would be very useful in improving the quality of > fuse_dfs and detecting regressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3754: -- Status: Open (was: Patch Available) > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha, 1.2.0 >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429635#comment-13429635 ] Todd Lipcon commented on HDFS-3754: --- +1 > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 1.2.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-3672: -- Attachment: hdfs-3672-6.patch Thanks for the detailed review ATM, I tried to address all your comments. I broke out the huge DFSClient method into a few smaller ones, which are still a bit large but logically sound. I can try to go further with this, but it'll mean passing more stuff in parameters. The config option I added ("dfs.client.file-block-locations.enabled") is default off, and checked client-side only. I could add this to the DN side too if we want to be really sure. > Expose disk-location information for blocks to enable better scheduling > --- > > Key: HDFS-3672 > URL: https://issues.apache.org/jira/browse/HDFS-3672 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, > hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch > > > Currently, HDFS exposes on which datanodes a block resides, which allows > clients to make scheduling decisions for locality and load balancing. > Extending this to also expose on which disk on a datanode a block resides > would enable even better scheduling, on a per-disk rather than coarse > per-datanode basis. > This API would likely look similar to Filesystem#getFileBlockLocations, but > also involve a series of RPCs to the responsible datanodes to determine disk > ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429611#comment-13429611 ] Hadoop QA commented on HDFS-3754: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539360/hdfs-3754.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2958//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2958//console This message is automatically generated. > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 1.2.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3741) QJM: exhaustive failure injection test for skipped RPCs
[ https://issues.apache.org/jira/browse/HDFS-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429581#comment-13429581 ] Aaron T. Myers commented on HDFS-3741: -- This is a pretty baller test, Todd. Good stuff. The patch looks good to me, and I agree it makes sense to go ahead and commit it to the branch. +1 > QJM: exhaustive failure injection test for skipped RPCs > --- > > Key: HDFS-3741 > URL: https://issues.apache.org/jira/browse/HDFS-3741 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: QuorumJournalManager (HDFS-3077) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-3741.txt > > > This JIRA is to add a test case which exhaustively tests double-failure > scenarios in a 3-node quorum setup. The test instruments the RPCs between the > client and the JNs, and injects faults, simulating a dropped RPC. The > framework used by this test will also be expanded in future JIRAs for other > failure scenarios. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3754: -- Attachment: hdfs-3754.txt Updated patch with comment per Colin's suggestion. > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 1.2.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3634) Add self-contained, mavenized fuse_dfs test
[ https://issues.apache.org/jira/browse/HDFS-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429556#comment-13429556 ] Andy Isaacson commented on HDFS-3634: - {code} +void sleepNoSig(int sec) +{ + struct timespec req, rem; + + req.tv_sec = sec; + req.tv_nsec = 0; + memset(&rem, 0, sizeof(rem)); + nanosleep(&req, &rem); +} {code} Is this supposed to resume the sleep if interrupted? If so we need a loop. If not, we can drop {{rem}} and just {{nanosleep(&req, 0);}}. > Add self-contained, mavenized fuse_dfs test > --- > > Key: HDFS-3634 > URL: https://issues.apache.org/jira/browse/HDFS-3634 > Project: Hadoop HDFS > Issue Type: Test > Components: fuse-dfs >Affects Versions: 2.1.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3634.002.patch, HDFS-3634.003.patch > > > We should have a self-contained, mavenized FUSE unit test which runs as part > of the normal build and can detect problems. Of course, because FUSE is an > optional build component, the unit test won't run unless the user has FUSE > installed. However, it would be very useful in improving the quality of > fuse_dfs and detecting regressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429554#comment-13429554 ] Eli Collins commented on HDFS-3754: --- Colin, thanks, sure, I'll add a comment. > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 1.2.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429552#comment-13429552 ] Colin Patrick McCabe commented on HDFS-3754: Initializing the ReadaheadPool in DataNode seems like a good idea to me. I also tested the latest patch, and it worked for me. Would it be worthwhile to add a comment to ReadaheadPool about the importance of having the correct thread context? Or maybe just a reference to this JIRA. I know that I definitely wouldn't have considered the importance of thread context in this situation. > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 1.2.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3634) Add self-contained, mavenized fuse_dfs test
[ https://issues.apache.org/jira/browse/HDFS-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429553#comment-13429553 ] Andy Isaacson commented on HDFS-3634: - {code} +static int expectDirs(const struct dirent *de, void *v) +{ + const char **names = (const char **)v; {code} no need to cast a void* in C, just assign. {code} + * @return 0 on success; error code otherwise {code} The function returns a negative errno, may as well document that. {code} +fprintf(stderr, "FUSE_TEST: failed to fork: error %d\n", ret); {code} please print {{strerror(errno)}} as well. (Several instances of this pattern, IMO we should never print errno without also printing strerror.) {code} + c = ((src[i + 1] - '0') << 16) | + ((src[i + 2] - '0') << 8) | +(src[i + 3] - '0'); {code} That's not the right way to decode a 3-digit octal string, it yields 0x10503 given "0153". I think you would want <<2 and <<5 but given that we're discussing it, clearly this needs a standalone function and a unit test; I'd copy to a temporary array and use {{strtol(buf,8,&p)}} then check p. {code} + f = fopen("/proc/mounts", "r"); ... +line = fgets(buf, sizeof(buf), f); {code} Please use {{getmntent(3)}} rather than rolling our own. {code} + snprintf(scratch, sizeof(scratch), "%s", argv0); {code} strncpy is more idiomatic. {code} + char mntTmp[PATH_MAX] = { 0 }; {code} Initialize strings with strings, so this should be {code}char mntTmp[PATH_MAX] = "";{code} . {code} +int recursiveDeleteContents(const char *path) {code} Do we really need this (potentially dangerous) code in the testcase? I'd hate to see a bug result in an accidental {{rm -rf $HOME}}. (I've looked at the obvious cases and don't see any bugs, but that's small comfort.) The target hdfs will be deleted afterwards so no need to delete there; the local target is pretty small so leaving it around is no big deal. Some of the code seems to indicate that something will error out if you attempt to mount over a directory with contents, but that seems like just a bug? {code} +if ((de->d_name[0] == '.') && (de->d_name[1] == '\0')) + continue; +if ((de->d_name[0] == '.') && (de->d_name[1] == '.') && +(de->d_name[2] == '\0')) {code} These would be more idiomatic as {{if (!strcmp(de->d_name, "."))}}, IMHO. But, your call. {code} +// canonicalize non-abosolute TMPDIR {code} abosolute -> absolute > Add self-contained, mavenized fuse_dfs test > --- > > Key: HDFS-3634 > URL: https://issues.apache.org/jira/browse/HDFS-3634 > Project: Hadoop HDFS > Issue Type: Test > Components: fuse-dfs >Affects Versions: 2.1.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3634.002.patch, HDFS-3634.003.patch > > > We should have a self-contained, mavenized FUSE unit test which runs as part > of the normal build and can detect problems. Of course, because FUSE is an > optional build component, the unit test won't run unless the user has FUSE > installed. However, it would be very useful in improving the quality of > fuse_dfs and detecting regressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3754) BlockSender doesn't shutdown ReadaheadPool threads
[ https://issues.apache.org/jira/browse/HDFS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3754: -- Attachment: hdfs-3754.txt Good point, it's possible in a test that an existing active block sender could race with the shutdown of another DN and submit to the pool that's been shutdown. I like the idea making the ReadaheadPool pool not part of the dataXceiverServer thread group, this can actually be accomplished more easily by just moving the initialization from BlockSender to DataNode, which is a more logical place anyway. Updated patch attached. > BlockSender doesn't shutdown ReadaheadPool threads > -- > > Key: HDFS-3754 > URL: https://issues.apache.org/jira/browse/HDFS-3754 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 1.2.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt, > hdfs-3754.txt, hdfs-3754.txt, hdfs-3754.txt > > > The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are > run with native libraries some tests fail (time out) because shutdown hangs > waiting for the outstanding threads to exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3723) All commands should support meaningful --help
[ https://issues.apache.org/jira/browse/HDFS-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429488#comment-13429488 ] Suresh Srinivas commented on HDFS-3723: --- Comments: # It may be a good idea to have another jira, that adds utility to often repeated things in this patch. # In if conditions around the conditions such "-h".equalsIngoreCase() etc, you do not need parenthesis # GetGroups.java Uncomment ToolRunner.printGenericCommandUsage # Can you please ensure an empty line is printed before printing generic command usage to separate the command related args from generic args. # In DFsck.java set returned result to zero when -help command is passed. # DFSZkFailoverController.java - what is "|" for in {{java zkfc [ -formatZK [-force] | [-nonInteractive] ]}} # TestHAAdmin.java - retain the previous test to check for -1 when you pass an invalid option and add new tests for -help, -h and --help. Could we add these tests for all the commands, if it is straightforward? Unrelated to your patch (since you are making changes in these files already): # DelegationTokenFetcher.java #* Remove unnecessary imports DFSConfigKeys, URLUtils, Text #* printUsage should not throw IOException > All commands should support meaningful --help > - > > Key: HDFS-3723 > URL: https://issues.apache.org/jira/browse/HDFS-3723 > Project: Hadoop HDFS > Issue Type: Improvement > Components: scripts, tools >Affects Versions: 2.0.0-alpha >Reporter: E. Sammer >Assignee: Jing Zhao > Attachments: HDFS-3723.patch, HDFS-3723.patch > > > Some (sub)commands support -help or -h options for detailed help while others > do not. Ideally, all commands should support meaningful help that works > regardless of current state or configuration. > For example, hdfs zkfc --help (or -h or -help) is not very useful. Option > checking should occur before state / configuration checking. > {code} > [esammer@hadoop-fed01 ~]# hdfs zkfc --help > Exception in thread "main" org.apache.hadoop.HadoopIllegalArgumentException: > HA is not enabled for this namenode. > at > org.apache.hadoop.hdfs.tools.DFSZKFailoverController.setConf(DFSZKFailoverController.java:122) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:66) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:168) > {code} > This would go a long way toward better usability for ops staff. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling
[ https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429481#comment-13429481 ] Hudson commented on HDFS-3579: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2575 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2575/]) Add two new files missed by last commit of HDFS-3579. (Revision 1370017) HDFS-3579. libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c > libhdfs: fix exception handling > --- > > Key: HDFS-3579 > URL: https://issues.apache.org/jira/browse/HDFS-3579 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.2.0-alpha > > Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, > HDFS-3579.006.patch > > > libhdfs does not consistently handle exceptions. Sometimes we don't free the > memory associated with them (memory leak). Sometimes we invoke JNI functions > that are not supposed to be invoked when an exception is active. > Running a libhdfs test program with -Xcheck:jni shows the latter problem > clearly: > {code} > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > Exception in thread "main" java.io.IOException: ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling
[ https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429445#comment-13429445 ] Hudson commented on HDFS-3579: -- Integrated in Hadoop-Common-trunk-Commit #2556 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2556/]) Add two new files missed by last commit of HDFS-3579. (Revision 1370017) HDFS-3579. libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c > libhdfs: fix exception handling > --- > > Key: HDFS-3579 > URL: https://issues.apache.org/jira/browse/HDFS-3579 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.2.0-alpha > > Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, > HDFS-3579.006.patch > > > libhdfs does not consistently handle exceptions. Sometimes we don't free the > memory associated with them (memory leak). Sometimes we invoke JNI functions > that are not supposed to be invoked when an exception is active. > Running a libhdfs test program with -Xcheck:jni shows the latter problem > clearly: > {code} > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > Exception in thread "main" java.io.IOException: ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling
[ https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429443#comment-13429443 ] Hudson commented on HDFS-3579: -- Integrated in Hadoop-Hdfs-trunk-Commit #2621 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2621/]) Add two new files missed by last commit of HDFS-3579. (Revision 1370017) HDFS-3579. libhdfs: fix exception handling. Contributed by Colin Patrick McCabe. (Revision 1370015) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370017 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.h atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1370015 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.h * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/native_mini_dfs.c > libhdfs: fix exception handling > --- > > Key: HDFS-3579 > URL: https://issues.apache.org/jira/browse/HDFS-3579 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.2.0-alpha > > Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, > HDFS-3579.006.patch > > > libhdfs does not consistently handle exceptions. Sometimes we don't free the > memory associated with them (memory leak). Sometimes we invoke JNI functions > that are not supposed to be invoked when an exception is active. > Running a libhdfs test program with -Xcheck:jni shows the latter problem > clearly: > {code} > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > Exception in thread "main" java.io.IOException: ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling
[ https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429430#comment-13429430 ] Aaron T. Myers commented on HDFS-3579: -- bq. I have tried running valgrind on fuse_dfs in the past. It doesn't really work-- Got it, thanks for the explanation. I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Colin. Fixes like this are yeoman's work. Thanks for doing it. > libhdfs: fix exception handling > --- > > Key: HDFS-3579 > URL: https://issues.apache.org/jira/browse/HDFS-3579 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.2.0-alpha > > Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, > HDFS-3579.006.patch > > > libhdfs does not consistently handle exceptions. Sometimes we don't free the > memory associated with them (memory leak). Sometimes we invoke JNI functions > that are not supposed to be invoked when an exception is active. > Running a libhdfs test program with -Xcheck:jni shows the latter problem > clearly: > {code} > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > Exception in thread "main" java.io.IOException: ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3579) libhdfs: fix exception handling
[ https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3579: - Resolution: Fixed Fix Version/s: 2.2.0-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > libhdfs: fix exception handling > --- > > Key: HDFS-3579 > URL: https://issues.apache.org/jira/browse/HDFS-3579 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.2.0-alpha > > Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, > HDFS-3579.006.patch > > > libhdfs does not consistently handle exceptions. Sometimes we don't free the > memory associated with them (memory leak). Sometimes we invoke JNI functions > that are not supposed to be invoked when an exception is active. > Running a libhdfs test program with -Xcheck:jni shows the latter problem > clearly: > {code} > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > Exception in thread "main" java.io.IOException: ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling
[ https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429427#comment-13429427 ] Colin Patrick McCabe commented on HDFS-3579: Thanks, atm. I have tried running valgrind on fuse_dfs in the past. It doesn't really work-- I get tons of false positives. I think there's a general problem running valgrind with JNI code. I did try adding more and more stuff to the "exclude lists," but it didn't seem to work. Maybe someone more knowledgeable on this topic can come up with a workaround. I'm also confused about whether valgrind can identify memory leaks of memory managed by the JVM. I suspect that the answer is "no," which would mean that the local reference leaks fixed by the patch would have been invisible to valgrind anyway. As far as I know, valgrind only deals with memory allocated via {{malloc}}-- although I'm happy to be corrected on this topic if someone has more info ( ? ) > libhdfs: fix exception handling > --- > > Key: HDFS-3579 > URL: https://issues.apache.org/jira/browse/HDFS-3579 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, > HDFS-3579.006.patch > > > libhdfs does not consistently handle exceptions. Sometimes we don't free the > memory associated with them (memory leak). Sometimes we invoke JNI functions > that are not supposed to be invoked when an exception is active. > Running a libhdfs test program with -Xcheck:jni shows the latter problem > clearly: > {code} > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > Exception in thread "main" java.io.IOException: ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling
[ https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429422#comment-13429422 ] Aaron T. Myers commented on HDFS-3579: -- bq. We need a longer running test that exercises more failure conditions to fully establish that all memory leaks are fixed. I think writing such a test is a little bit of out scope for this JIRA, but it's definitely something we should do in the future. Definitely agree that writing such a test is well out of scope for this JIRA, but would it be possible to, for example, run test_fuse_dfs with valgrind? (No need to do that for this JIRA. This is good cleanup regardless, and we can fix any other memory leaks found in a different JIRA.) {quote} Yes. Running a before and after with LIBHDFS_OPTS="-Xcheck:jni -Xcheck:nabounds" confirms that the messages about "JNI call made with exception pending" are gone after the patch. The test I ran was test_libhdfs_threaded. I also ran test_fuse_dfs and verified that it passed successfully. That test also exercises libhdfs, albeit in a slightly different way. {quote} Cool, thanks for doing that. +1, I'll go ahead and commit this patch. > libhdfs: fix exception handling > --- > > Key: HDFS-3579 > URL: https://issues.apache.org/jira/browse/HDFS-3579 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, > HDFS-3579.006.patch > > > libhdfs does not consistently handle exceptions. Sometimes we don't free the > memory associated with them (memory leak). Sometimes we invoke JNI functions > that are not supposed to be invoked when an exception is active. > Running a libhdfs test program with -Xcheck:jni shows the latter problem > clearly: > {code} > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > Exception in thread "main" java.io.IOException: ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3758) TestFuseDFS test failing
[ https://issues.apache.org/jira/browse/HDFS-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429418#comment-13429418 ] Colin Patrick McCabe commented on HDFS-3758: Just to be clear, the reason for foregrounding fuse_dfs is so we can capture the log output, which we otherwise would not see. Not having log output makes debugging difficult, as you might imagine. > TestFuseDFS test failing > > > Key: HDFS-3758 > URL: https://issues.apache.org/jira/browse/HDFS-3758 > Project: Hadoop HDFS > Issue Type: Bug > Components: fuse-dfs >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3758-b1.001.patch, HDFS-3758.003.patch, > TestFuseDFS-fix-0002.patch > > > TestFuseDFS.java has two bugs: > * there is a race condition between mounting the filesystem and testing it > * it doesn't clear the mount directory before it tries to mount there -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3758) TestFuseDFS test failing
[ https://issues.apache.org/jira/browse/HDFS-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429417#comment-13429417 ] Colin Patrick McCabe commented on HDFS-3758: Here's a little more explanation of the patch: {{-ononempty}} allows FUSE to mount over a non-empty directory. Since we previously had a bug which could result in the fuse mount directory getting full of junk, you can see why this is useful. This patch also changes the way we run fuse_dfs slightly. Rather than running it in the background, we run it in the foreground, piping its stdout and stderr to java threads. This is the meaning of the {{-f}} option. > TestFuseDFS test failing > > > Key: HDFS-3758 > URL: https://issues.apache.org/jira/browse/HDFS-3758 > Project: Hadoop HDFS > Issue Type: Bug > Components: fuse-dfs >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3758-b1.001.patch, HDFS-3758.003.patch, > TestFuseDFS-fix-0002.patch > > > TestFuseDFS.java has two bugs: > * there is a race condition between mounting the filesystem and testing it > * it doesn't clear the mount directory before it tries to mount there -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling
[ https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429408#comment-13429408 ] Colin Patrick McCabe commented on HDFS-3579: bq. Are the warnings about pending exceptions now gone from the logs? Yes. Running a before and after with LIBHDFS_OPTS="-Xcheck:jni -Xcheck:nabounds" confirms that the messages about "JNI call made with exception pending" are gone after the patch. The test I ran was test_libhdfs_threaded. I also ran test_fuse_dfs and verified that it passed successfully. That test also exercises libhdfs, albeit in a slightly different way. We need a longer running test that exercises more failure conditions to fully establish that all memory leaks are fixed. I think writing such a test is a little bit of out scope for this JIRA, but it's definitely something we should do in the future. > libhdfs: fix exception handling > --- > > Key: HDFS-3579 > URL: https://issues.apache.org/jira/browse/HDFS-3579 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, > HDFS-3579.006.patch > > > libhdfs does not consistently handle exceptions. Sometimes we don't free the > memory associated with them (memory leak). Sometimes we invoke JNI functions > that are not supposed to be invoked when an exception is active. > Running a libhdfs test program with -Xcheck:jni shows the latter problem > clearly: > {code} > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > Exception in thread "main" java.io.IOException: ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3767) Finer grained locking in DN
Todd Lipcon created HDFS-3767: - Summary: Finer grained locking in DN Key: HDFS-3767 URL: https://issues.apache.org/jira/browse/HDFS-3767 Project: Hadoop HDFS Issue Type: Improvement Components: performance Affects Versions: 3.0.0 Reporter: Todd Lipcon In testing a high-write-throughput workload, I see the DN maintain good performance most of the time, except that occasionally one thread will block for a few seconds in {{finalizeReplica}}. It does so holding the FSDatasetImpl lock, which causes all other writer threads to block behind it. HDFS-1148 (making it a rw lock) would help here, but a bigger help would be to go do finer-grained locking (eg per block or per-subdir). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages
[ https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429363#comment-13429363 ] Todd Lipcon commented on HDFS-3765: --- Hey Vinay. Thanks a lot for doing this - it's been on my list but hadn't gotten to it yet. Do you plan to add a test case, perhaps against the BKJM implementation? I'll look at the code as soon as I can. > Namenode INITIALIZESHAREDEDITS should be able to initialize all shared > storages > --- > > Key: HDFS-3765 > URL: https://issues.apache.org/jira/browse/HDFS-3765 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha >Affects Versions: 2.1.0-alpha, 3.0.0 >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-3765.patch > > > Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits > files to file schema based shared storages when moving cluster from Non-HA > environment to HA enabled environment. > This Jira focuses on the following > * Generalizing the logic of copying the edits to new shared storage so that > any schema based shared storage can initialized for HA cluster. > * Ability to Initialize new shared storage from existing shared storage when > moving from One shared storage to another shared storage (Might be because of > cost, performance, etc. For ex: Moving from NFS to BKJM/QJM). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429354#comment-13429354 ] Aaron T. Myers commented on HDFS-3672: -- Patch looks pretty good to me. A few comments: # In DFSClient#getDiskBlockLocations, I recommend you add an instanceof check before the BlockLocation downcast to HdfsBlockLocation. Much better to throw a helpful RTE than some opaque ClassCastException. # The DFSClient#getDiskBlockLocations method is huge, and has a few very distinct phases. I recommend you break this up into a few separate helper methods, e.g. one or two to initialize the data structures, one or two to perform the RPCs, one to re-associate the DN results with the correct block, etc. # Unless I'm missing something, seems like you could easily make DiskBlockLocationCallable a static inner class. # The javadoc parameter comment "@param blocks a List" is not very helpful, since when the javadocs are generated the type of the parameter will automatically be included. # The javadoc for DFSClient#getDiskBlockLocations should be a proper javadoc, i.e. with @param and @returns tags. I also recommend having this javadoc reference DistributedFileSystem#getFileDiskBlockLocations. # In the new javadoc in DistributedFileSystem, you incorrectly say that this interface exists in the FileSystem class as well, and say "this is more helpful with DFS", which is the only implementation. # I think you should change the LimitedPrivate InterfaceAudience annotations to Public, but keep the Unstable InterfaceStability annotations. # Put a single space around your operators, e.g. "for (int i=0; i Expose disk-location information for blocks to enable better scheduling > --- > > Key: HDFS-3672 > URL: https://issues.apache.org/jira/browse/HDFS-3672 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, > hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch > > > Currently, HDFS exposes on which datanodes a block resides, which allows > clients to make scheduling decisions for locality and load balancing. > Extending this to also expose on which disk on a datanode a block resides > would enable even better scheduling, on a per-disk rather than coarse > per-datanode basis. > This API would likely look similar to Filesystem#getFileBlockLocations, but > also involve a series of RPCs to the responsible datanodes to determine disk > ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3715) Fix TestFileCreation#testFileCreationNamenodeRestart
[ https://issues.apache.org/jira/browse/HDFS-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429294#comment-13429294 ] Andrew Wang commented on HDFS-3715: --- This test failure could be related to HDFS-3658, the logs look like the same problem even though the assert failure is a bit different. Anyway, I ran this test locally and it worked. I believe unrelated. > Fix TestFileCreation#testFileCreationNamenodeRestart > > > Key: HDFS-3715 > URL: https://issues.apache.org/jira/browse/HDFS-3715 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.2.0-alpha >Reporter: Eli Collins >Assignee: Andrew Wang > Attachments: hdfs-3715-1.patch > > > TestFileCreation#testFileCreationNamenodeRestart is ignored due to the > following. We should (a) modify this test to test the current expected > behavior for leases on restart and (b) file any jiras for necessary fixes to > close the gap between current and desired behavior. > {code} > /** >* Test that file leases are persisted across namenode restarts. >* This test is currently not triggered because more HDFS work is >* is needed to handle persistent leases. >*/ > @Ignore > @Test > public void xxxtestFileCreationNamenodeRestart() throws IOException { > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3766) TestStorageRestore fails on Windows
Brandon Li created HDFS-3766: Summary: TestStorageRestore fails on Windows Key: HDFS-3766 URL: https://issues.apache.org/jira/browse/HDFS-3766 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 1-win Reporter: Brandon Li Assignee: Brandon Li Test setup failed because it can't delete the directories/files being used by the test itself. Unlike Linux, Windows doesn't allow deleting a file or directory which is opened with no share/delete permission by a different process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3579) libhdfs: fix exception handling
[ https://issues.apache.org/jira/browse/HDFS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429264#comment-13429264 ] Aaron T. Myers commented on HDFS-3579: -- I've taken a look at the patch and it looks good to me. I agree that this is some good cleanup to do. Thanks a lot, Andy, for your very thorough review. One question before I commit this patch, though: can you please describe what sort of testing you did to verify this change? Are the warnings about pending exceptions now gone from the logs? Were you able to ensure that memory is no longer leaked when exceptions are thrown? > libhdfs: fix exception handling > --- > > Key: HDFS-3579 > URL: https://issues.apache.org/jira/browse/HDFS-3579 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-3579.004.patch, HDFS-3579.005.patch, > HDFS-3579.006.patch > > > libhdfs does not consistently handle exceptions. Sometimes we don't free the > memory associated with them (memory leak). Sometimes we invoke JNI functions > that are not supposed to be invoked when an exception is active. > Running a libhdfs test program with -Xcheck:jni shows the latter problem > clearly: > {code} > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > WARNING in native method: JNI call made with exception pending > Exception in thread "main" java.io.IOException: ... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429174#comment-13429174 ] Hudson commented on HDFS-3719: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2573 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2573/]) HDFS-3719. Re-enable append-related tests in TestFileConcurrentReader. Contributed by Andrew Wang. (Revision 1369848) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1369848 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java > Re-enable append-related tests in TestFileConcurrentReader > -- > > Key: HDFS-3719 > URL: https://issues.apache.org/jira/browse/HDFS-3719 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Fix For: 2.2.0-alpha > > Attachments: hdfs-3719-1.patch > > > Both of these tests are disabled. We should figure out what append > functionality we need to make the tests work again, and reenable them. > {code} > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorTransferToAppend() > throws IOException { > runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); > } > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorNormalTransferAppend() > throws IOException { > runTestUnfinishedBlockCRCError(false, SyncType.APPEND, > DEFAULT_WRITE_SIZE); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429152#comment-13429152 ] Hudson commented on HDFS-3719: -- Integrated in Hadoop-Common-trunk-Commit #2554 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2554/]) HDFS-3719. Re-enable append-related tests in TestFileConcurrentReader. Contributed by Andrew Wang. (Revision 1369848) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1369848 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java > Re-enable append-related tests in TestFileConcurrentReader > -- > > Key: HDFS-3719 > URL: https://issues.apache.org/jira/browse/HDFS-3719 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Fix For: 2.2.0-alpha > > Attachments: hdfs-3719-1.patch > > > Both of these tests are disabled. We should figure out what append > functionality we need to make the tests work again, and reenable them. > {code} > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorTransferToAppend() > throws IOException { > runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); > } > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorNormalTransferAppend() > throws IOException { > runTestUnfinishedBlockCRCError(false, SyncType.APPEND, > DEFAULT_WRITE_SIZE); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3719: - Resolution: Fixed Fix Version/s: 2.2.0-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Andrew. > Re-enable append-related tests in TestFileConcurrentReader > -- > > Key: HDFS-3719 > URL: https://issues.apache.org/jira/browse/HDFS-3719 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Fix For: 2.2.0-alpha > > Attachments: hdfs-3719-1.patch > > > Both of these tests are disabled. We should figure out what append > functionality we need to make the tests work again, and reenable them. > {code} > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorTransferToAppend() > throws IOException { > runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); > } > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorNormalTransferAppend() > throws IOException { > runTestUnfinishedBlockCRCError(false, SyncType.APPEND, > DEFAULT_WRITE_SIZE); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429151#comment-13429151 ] Hudson commented on HDFS-3719: -- Integrated in Hadoop-Hdfs-trunk-Commit #2619 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2619/]) HDFS-3719. Re-enable append-related tests in TestFileConcurrentReader. Contributed by Andrew Wang. (Revision 1369848) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1369848 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java > Re-enable append-related tests in TestFileConcurrentReader > -- > > Key: HDFS-3719 > URL: https://issues.apache.org/jira/browse/HDFS-3719 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Fix For: 2.2.0-alpha > > Attachments: hdfs-3719-1.patch > > > Both of these tests are disabled. We should figure out what append > functionality we need to make the tests work again, and reenable them. > {code} > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorTransferToAppend() > throws IOException { > runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); > } > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorNormalTransferAppend() > throws IOException { > runTestUnfinishedBlockCRCError(false, SyncType.APPEND, > DEFAULT_WRITE_SIZE); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3719: - Summary: Re-enable append-related tests in TestFileConcurrentReader (was: Fix TestFileConcurrentReader#testUnfinishedBlockCrcErrorTransferToAppend and #testUnfinishedBlockCRCErrorNormalTransferAppend) > Re-enable append-related tests in TestFileConcurrentReader > -- > > Key: HDFS-3719 > URL: https://issues.apache.org/jira/browse/HDFS-3719 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-3719-1.patch > > > Both of these tests are disabled. We should figure out what append > functionality we need to make the tests work again, and reenable them. > {code} > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorTransferToAppend() > throws IOException { > runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); > } > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorNormalTransferAppend() > throws IOException { > runTestUnfinishedBlockCRCError(false, SyncType.APPEND, > DEFAULT_WRITE_SIZE); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3719) Fix TestFileConcurrentReader#testUnfinishedBlockCrcErrorTransferToAppend and #testUnfinishedBlockCRCErrorNormalTransferAppend
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429148#comment-13429148 ] Aaron T. Myers commented on HDFS-3719: -- +1, the patch looks good to me. I'm going to commit this momentarily. > Fix TestFileConcurrentReader#testUnfinishedBlockCrcErrorTransferToAppend and > #testUnfinishedBlockCRCErrorNormalTransferAppend > - > > Key: HDFS-3719 > URL: https://issues.apache.org/jira/browse/HDFS-3719 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-3719-1.patch > > > Both of these tests are disabled. We should figure out what append > functionality we need to make the tests work again, and reenable them. > {code} > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorTransferToAppend() > throws IOException { > runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); > } > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorNormalTransferAppend() > throws IOException { > runTestUnfinishedBlockCRCError(false, SyncType.APPEND, > DEFAULT_WRITE_SIZE); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3719) Fix TestFileConcurrentReader#testUnfinishedBlockCrcErrorTransferToAppend and #testUnfinishedBlockCRCErrorNormalTransferAppend
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3719: - Target Version/s: 2.2.0-alpha > Fix TestFileConcurrentReader#testUnfinishedBlockCrcErrorTransferToAppend and > #testUnfinishedBlockCRCErrorNormalTransferAppend > - > > Key: HDFS-3719 > URL: https://issues.apache.org/jira/browse/HDFS-3719 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-3719-1.patch > > > Both of these tests are disabled. We should figure out what append > functionality we need to make the tests work again, and reenable them. > {code} > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorTransferToAppend() > throws IOException { > runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); > } > // fails due to issue w/append, disable > @Ignore > @Test > public void _testUnfinishedBlockCRCErrorNormalTransferAppend() > throws IOException { > runTestUnfinishedBlockCRCError(false, SyncType.APPEND, > DEFAULT_WRITE_SIZE); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3744) Decommissioned nodes are included in cluster after switch which is not expected
[ https://issues.apache.org/jira/browse/HDFS-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429144#comment-13429144 ] Aaron T. Myers commented on HDFS-3744: -- bq. And I would like to add Standby check at replication monitor to avoid load in cluster. Got it. This seems like a separate issue from what's being discussed here, though, and so should probably be done as a separate JIRA. Do you agree? bq. By persisting into edit logs we can be sure of which DN is decommissioned? Not only by Standby NN but also when Standalone NN restarts. The question that I have is still "How would differences be rectified between what's persisted in the edit log and what's present in the excluded hosts file?" Imagine that some host is not present in the excluded hosts file, but a decommission action for that host is present in the edit log. Given that edit logs are occasionally merged into an fsimage and the edit logs discarded, this would imply that we'd need to introduce a new section into the fsimage for per-host DN status. This means that we'd end up with two potentially out of sync lists of DN decommission status: one in the excludes file, the other in this new section of the fsimage file. My point is that I think persisting DN decommission status to the edit log / fsimage is not an unreasonable idea, but it does seem like an idea that's incompatible with the excluded hosts config file. Given that, I'm still in favor of just requiring the admin keep the excluded hosts files in sync, and call refreshNodes on both NNs from DFSAdmin. I think this argument is further supported by the fact that the active/standby NN having an out of sync view of DN decommission status isn't actually that big of a problem. Yes, it might result in some unnecessary replication traffic, but it shouldn't result in data loss or unavailability, since DNs already ignore replication commands from anything but the active NN. > Decommissioned nodes are included in cluster after switch which is not > expected > --- > > Key: HDFS-3744 > URL: https://issues.apache.org/jira/browse/HDFS-3744 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.0.0-alpha, 2.1.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula > > Scenario: > = > Start ANN and SNN with three DN's > Exclude DN1 from cluster by using decommission feature > (./hdfs dfsadmin -fs hdfs://ANNIP:8020 -refreshNodes) > After decommission successful,do switch such that SNN will become Active. > Here exclude node(DN1) is included in cluster.Able to write files to excluded > node since it's not excluded. > Checked SNN(Which Active before switch) UI decommissioned=1 and ANN UI > decommissioned=0 > One more Observation: > > All dfsadmin commands will create proxy only on nn1 irrespective of Active or > standby.I think this also we need to re-look once.. > I am not getting , why we are not given HA for dfsadmin commands..? > Please correct me,,If I am wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages
[ https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-3765: Attachment: HDFS-3765.patch Attached the patch for the same. > Namenode INITIALIZESHAREDEDITS should be able to initialize all shared > storages > --- > > Key: HDFS-3765 > URL: https://issues.apache.org/jira/browse/HDFS-3765 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha >Affects Versions: 2.1.0-alpha, 3.0.0 >Reporter: Vinay >Assignee: Vinay > Attachments: HDFS-3765.patch > > > Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits > files to file schema based shared storages when moving cluster from Non-HA > environment to HA enabled environment. > This Jira focuses on the following > * Generalizing the logic of copying the edits to new shared storage so that > any schema based shared storage can initialized for HA cluster. > * Ability to Initialize new shared storage from existing shared storage when > moving from One shared storage to another shared storage (Might be because of > cost, performance, etc. For ex: Moving from NFS to BKJM/QJM). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS
[ https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429105#comment-13429105 ] Tsz Wo (Nicholas), SZE commented on HDFS-385: - I have committed the branch-1 and branch-1-win patches. Thanks, Suma! > Design a pluggable interface to place replicas of blocks in HDFS > > > Key: HDFS-385 > URL: https://issues.apache.org/jira/browse/HDFS-385 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Fix For: 0.21.0 > > Attachments: BlockPlacementPluggable.txt, > BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, > BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, > BlockPlacementPluggable5.txt, BlockPlacementPluggable6.txt, > BlockPlacementPluggable7.txt, blockplacementpolicy-branch-1-win.patch, > blockplacementpolicy-branch-1.patch, > blockplacementpolicy2-branch-1-win.patch, > blockplacementpolicy2-branch-1.patch, > blockplacementpolicy3-branch-1-win.patch, > blockplacementpolicy3-branch-1.patch, rat094.txt > > > The current HDFS code typically places one replica on local rack, the second > replica on remote random rack and the third replica on a random node of that > remote rack. This algorithm is baked in the NameNode's code. It would be nice > to make the block placement algorithm a pluggable interface. This will allow > experimentation of different placement algorithms based on workloads, > availability guarantees and failure models. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS
[ https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429004#comment-13429004 ] Tsz Wo (Nicholas), SZE commented on HDFS-385: - +1 the branch-1 patch looks good. > Design a pluggable interface to place replicas of blocks in HDFS > > > Key: HDFS-385 > URL: https://issues.apache.org/jira/browse/HDFS-385 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Fix For: 0.21.0 > > Attachments: BlockPlacementPluggable.txt, > BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, > BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, > BlockPlacementPluggable5.txt, BlockPlacementPluggable6.txt, > BlockPlacementPluggable7.txt, blockplacementpolicy-branch-1-win.patch, > blockplacementpolicy-branch-1.patch, > blockplacementpolicy2-branch-1-win.patch, > blockplacementpolicy2-branch-1.patch, > blockplacementpolicy3-branch-1-win.patch, > blockplacementpolicy3-branch-1.patch, rat094.txt > > > The current HDFS code typically places one replica on local rack, the second > replica on remote random rack and the third replica on a random node of that > remote rack. This algorithm is baked in the NameNode's code. It would be nice > to make the block placement algorithm a pluggable interface. This will allow > experimentation of different placement algorithms based on workloads, > availability guarantees and failure models. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira