[jira] [Updated] (HDFS-5386) Add feature documentation for datanode caching.
[ https://issues.apache.org/jira/browse/HDFS-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5386: Target Version/s: 3.0.0 (was: HDFS-4949) Affects Version/s: (was: HDFS-4949) 3.0.0 > Add feature documentation for datanode caching. > --- > > Key: HDFS-5386 > URL: https://issues.apache.org/jira/browse/HDFS-5386 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Colin Patrick McCabe > Attachments: HDFS-5386-caching.001.patch, HDFS-5386-caching.002.patch > > > Write feature documentation for datanode caching, covering all of the > following: > * high-level architecture > * OS/native code requirements > * OS configuration (ulimit -l) > * new configuration properties for namenode and datanode > * cache admin CLI commands > * pointers to API for programmatic control of caching directives -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5386) Add feature documentation for datanode caching.
[ https://issues.apache.org/jira/browse/HDFS-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5386: Attachment: HDFS-5386-caching.002.patch I'm attaching patch version 2. This folds in all of my prior feedback, except for 2 items: * I didn't include the high-level architecture diagram. Maybe this would be easier for either Colin or Andrew if they have access to the original source document. * I didn't add references to the {{DistributedFileSystem}} API, because this isn't included in the JavaDocs. > Add feature documentation for datanode caching. > --- > > Key: HDFS-5386 > URL: https://issues.apache.org/jira/browse/HDFS-5386 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation >Affects Versions: HDFS-4949 >Reporter: Chris Nauroth >Assignee: Colin Patrick McCabe > Attachments: HDFS-5386-caching.001.patch, HDFS-5386-caching.002.patch > > > Write feature documentation for datanode caching, covering all of the > following: > * high-level architecture > * OS/native code requirements > * OS configuration (ulimit -l) > * new configuration properties for namenode and datanode > * cache admin CLI commands > * pointers to API for programmatic control of caching directives -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5441) Wrong use of catalina opts in httpfs.sh
Dridi Boukelmoune created HDFS-5441: --- Summary: Wrong use of catalina opts in httpfs.sh Key: HDFS-5441 URL: https://issues.apache.org/jira/browse/HDFS-5441 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Dridi Boukelmoune Hey there, There is a comment mentioning a bug in catalina.sh (tomcat) in httpfs.sh: https://github.com/apache/hadoop-common/blob/1f2a21f/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh#L51 This behavior (not using those opts when stopping) is the very purpose of the CATALINA_OPTS variable as documented in catalina.sh: https://github.com/apache/tomcat/blob/d88ad9e/bin/catalina.sh#L36 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807709#comment-13807709 ] Hadoop QA commented on HDFS-5252: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610767/HDFS-5252.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5303//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5303//console This message is automatically generated. > Stable write is not handled correctly in someplace > -- > > Key: HDFS-5252 > URL: https://issues.apache.org/jira/browse/HDFS-5252 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Reporter: Brandon Li >Assignee: Brandon Li > Attachments: HDFS-5252.001.patch > > > When the client asks for a stable write but the prerequisite writes are not > transferred to NFS gateway, the stableness can't be honored. NFS gateway has > to treat the write as unstable write and set the flag to UNSTABLE in the > write response. > One bug was found during test with Ubuntu client when copying one 1KB file. > For small files like 1KB file, Ubuntu client does one stable write (with > FILE_SYNC flag). However, NFS gateway missed one place > where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated > to UNSTABLE. > With this bug, the client thinks the write is on disk and thus doesn't send > COMMIT anymore. The following test tries to read the data back and of course > fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5252: - Status: Patch Available (was: Open) > Stable write is not handled correctly in someplace > -- > > Key: HDFS-5252 > URL: https://issues.apache.org/jira/browse/HDFS-5252 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Reporter: Brandon Li >Assignee: Brandon Li > Attachments: HDFS-5252.001.patch > > > When the client asks for a stable write but the prerequisite writes are not > transferred to NFS gateway, the stableness can't be honored. NFS gateway has > to treat the write as unstable write and set the flag to UNSTABLE in the > write response. > One bug was found during test with Ubuntu client when copying one 1KB file. > For small files like 1KB file, Ubuntu client does one stable write (with > FILE_SYNC flag). However, NFS gateway missed one place > where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated > to UNSTABLE. > With this bug, the client thinks the write is on disk and thus doesn't send > COMMIT anymore. The following test tries to read the data back and of course > fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-5252) Stable write is not handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li reassigned HDFS-5252: Assignee: Brandon Li > Stable write is not handled correctly in someplace > -- > > Key: HDFS-5252 > URL: https://issues.apache.org/jira/browse/HDFS-5252 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Reporter: Brandon Li >Assignee: Brandon Li > Attachments: HDFS-5252.001.patch > > > When the client asks for a stable write but the prerequisite writes are not > transferred to NFS gateway, the stableness can't be honored. NFS gateway has > to treat the write as unstable write and set the flag to UNSTABLE in the > write response. > One bug was found during test with Ubuntu client when copying one 1KB file. > For small files like 1KB file, Ubuntu client does one stable write (with > FILE_SYNC flag). However, NFS gateway missed one place > where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated > to UNSTABLE. > With this bug, the client thinks the write is on disk and thus doesn't send > COMMIT anymore. The following test tries to read the data back and of course > fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5252: - Attachment: HDFS-5252.001.patch > Stable write is not handled correctly in someplace > -- > > Key: HDFS-5252 > URL: https://issues.apache.org/jira/browse/HDFS-5252 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Reporter: Brandon Li >Assignee: Brandon Li > Attachments: HDFS-5252.001.patch > > > When the client asks for a stable write but the prerequisite writes are not > transferred to NFS gateway, the stableness can't be honored. NFS gateway has > to treat the write as unstable write and set the flag to UNSTABLE in the > write response. > One bug was found during test with Ubuntu client when copying one 1KB file. > For small files like 1KB file, Ubuntu client does one stable write (with > FILE_SYNC flag). However, NFS gateway missed one place > where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated > to UNSTABLE. > With this bug, the client thinks the write is on disk and thus doesn't send > COMMIT anymore. The following test tries to read the data back and of course > fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5252: - Summary: Stable write is not handled correctly in someplace (was: Stable write is handled correctly in someplace) > Stable write is not handled correctly in someplace > -- > > Key: HDFS-5252 > URL: https://issues.apache.org/jira/browse/HDFS-5252 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Reporter: Brandon Li > > When the client asks for a stable write but the prerequisite writes are not > transferred to NFS gateway, the stableness can't be honored. NFS gateway has > to treat the write as unstable write and set the flag to UNSTABLE in the > write response. > One bug was found during test with Ubuntu client when copying one 1KB file. > For small files like 1KB file, Ubuntu client does one stable write (with > FILE_SYNC flag). However, NFS gateway missed one place > where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated > to UNSTABLE. > With this bug, the client thinks the write is on disk and thus doesn't send > COMMIT anymore. The following test tries to read the data back and of course > fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807681#comment-13807681 ] Hadoop QA commented on HDFS-2832: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610761/h2832_20131028b.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5302//console This message is automatically generated. > Enable support for heterogeneous storages in HDFS > - > > Key: HDFS-2832 > URL: https://issues.apache.org/jira/browse/HDFS-2832 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Attachments: 20130813-HeterogeneousStorage.pdf, > h2832_20131023b.patch, h2832_20131023.patch, h2832_20131025.patch, > h2832_20131028b.patch, h2832_20131028.patch > > > HDFS currently supports configuration where storages are a list of > directories. Typically each of these directories correspond to a volume with > its own file system. All these directories are homogeneous and therefore > identified as a single storage at the namenode. I propose, change to the > current model where Datanode * is a * storage, to Datanode * is a collection > * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-2832: - Attachment: h2832_20131028b.patch The Jenkins failed to run through result, rename the patch and submit it again. > Enable support for heterogeneous storages in HDFS > - > > Key: HDFS-2832 > URL: https://issues.apache.org/jira/browse/HDFS-2832 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Attachments: 20130813-HeterogeneousStorage.pdf, > h2832_20131023b.patch, h2832_20131023.patch, h2832_20131025.patch, > h2832_20131028b.patch, h2832_20131028.patch > > > HDFS currently supports configuration where storages are a list of > directories. Typically each of these directories correspond to a volume with > its own file system. All these directories are homogeneous and therefore > identified as a single storage at the namenode. I propose, change to the > current model where Datanode * is a * storage, to Datanode * is a collection > * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5433) When reloading fsimage during checkpointing, we should clear existing snapshottable directories
[ https://issues.apache.org/jira/browse/HDFS-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807600#comment-13807600 ] Todd Lipcon commented on HDFS-5433: --- looks good to me too, +1 with Vinay's comments addressed > When reloading fsimage during checkpointing, we should clear existing > snapshottable directories > --- > > Key: HDFS-5433 > URL: https://issues.apache.org/jira/browse/HDFS-5433 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 2.2.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Critical > Attachments: HDFS-5433.patch > > > The complete set of snapshottable directories are referenced both via the > file system tree and in the SnapshotManager class. It's possible that when > the 2NN performs a checkpoint, it will reload its in-memory state based on a > new fsimage from the NN, but will not clear the set of snapshottable > directories referenced by the SnapshotManager. In this case, the 2NN will > write out an fsimage that cannot be loaded, since the integer written to the > fsimage indicating the number of snapshottable directories will be out of > sync with the actual number of snapshottable directories serialized to the > fsimage. > This is basically the same as HDFS-3835, but for snapshottable directories > instead of delegation tokens. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807586#comment-13807586 ] Colin Patrick McCabe commented on HDFS-5394: Andrew, if I understand your proposal correctly, you're proposing to split the {{replicaMap}} into three maps: {{beingCachedReplicaMap}}, {{cachedReplicaMap}}, and {{beingUncachedReplicaMap}}, and protect all three with a big lock. It seems like this will actually result in more code, since we'll have to check multiple maps in many cases (i.e., we don't want to advertise something that is being uncached, and we don't want to start caching something that is already cached or currently being uncached.). We could combine it into 2 maps with some funky booleans, but I think it would be get pretty confusing. I really just wanted a unified map that tells me where everything is, not 2 or 3 maps. >From an efficiency point of view, 3 maps is also worse than 1.. as you know :) > This is particularly annoying with {{HashMap}}, since its memory consumption >never shrinks, but only grows as needed. I think a big part of why the complexity exists today is that we have to drop the (conceptual) lock when doing the mmap or munmap operation. This is a requirement, since they are potentially long-running operations. This in turn results in some complexity since once we finish the mmap, we have to retake the lock and figure out if the world changed underneath us. For example, someone could have cancelled the caching operation while we released the lock and started doing our thing. This complexity doesn't go away when you split the maps-- in fact, it gets worse, since you have to remember to check all of them. If you think the compare-and-swap stuff is too complex, I could use a mutex for that, but again, it's going to be a similar amount of code, since it's doing a similar thing. Re: background sweeper thread. Isn't that pretty much equivalent to having a single Executor in {{FsDatasetCache}} like this patch adds? I kind of like the {{Executor}} approach since it will tear down the thread after a few minutes of inactivity. But perhaps I could be convinced otherwise. Anyway, I'd rather do that refactoring later if possible. bq. FsDatasetCache#Key#equals: This uses a string comparison of the class name. Should it do a reference-equals of the Class objects instead? Sure. bq. FsDatasetCache#getCachedBlocks: This method is no longer filtering by block pool. The bpid argument is unused. Fixed. bq. FsDatasetCache#cacheBlock: Does it make sense to move all I/O, including opening the streams, behind the CachingTask? If so, then this would also simplify the error handling, because you wouldn't need to decrement usedBytes and close the streams here. I think that's a good idea. I'll see if I can reorganize it along those lines. bq. MappableBlock#mlocker: Can you please annotate this as @VisibleForTesting? OK bq. MappableBlock#load: Regarding the null check of blockChannel, is it actually possible for FileInputStream#getChannel to return null, or was this done for defensive coding purposes? (No objection if it's just defensive coding. I'm just curious if you know of a particular condition that causes this.) I checked out the JDK source, and I don't think {{FileInputStream#getChannel}} can ever return null. I guess when I wrote this, I was thinking of the {{Socket}} API, where {{getChannel}} sometimes does return null. It's probably best to remove this null check since the API documentation is pretty clear, and Java catches such conditions anyway. bq. MappableBlock#verifyChecksum: This is now passing a hard-coded file name to DataChecksum#verifyChunkedSums. Should this be switched back to the block file name? I was having some difficulty getting at the block file name. It's not provided by {{getBlockInputStream}} or {{getMetaDataInputStream}}. It turns out that it's available through the {{ReplicaInfo}}, though. Will fix. bq. {{TestFsDatasetCache#testUncachingBlocksBeforeCachingFinishes}}... I guess I don't really have a great solution to this. The problem is that we currently don't really know when the {{DNA_UNCACHE}} messages reach the DN. Setting the heartbeat responses is one thing, but these responses won't be sent until the DN sends its own heartbeat to the NN. It's an async process. We could perhaps hook into the heartbeat handling code in the DN, but a simpler solution might just be using a delay 2x or 3x longer than the configured heartbeat. In practice that would be 3 seconds or so. > fix race conditions in DN caching and uncaching > --- > > Key: HDFS-5394 > URL: https://issues.apache.org/jira/browse/HDFS-5394 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode,
[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807564#comment-13807564 ] Hadoop QA commented on HDFS-5438: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610713/HDFS-5438-1.trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCorruptFilesJsp org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5300//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5300//console This message is automatically generated. > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438-1.trunk.patch, HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807561#comment-13807561 ] Haohui Mai commented on HDFS-5333: -- The browser accesses the page, and the input from JMX directly. The HTTP requests look like the following: {noformat} http:///static/hadoop.css http:///dfshealth.html http:///jmx/foobar {noformat} The HTTP requests of accessing the old web UI look like the following: {noformat} http:///dfshealth.jsp http:///static/hadoop.css {noformat} Therefore: * You can access the new web UI if you can access the old one, regardless to the settings of the port-based firewall. * If you access the old web UI through a proxy, the set up for the new web UI is similar. Hope that answers your question. > Improvement of current HDFS Web UI > -- > > Key: HDFS-5333 > URL: https://issues.apache.org/jira/browse/HDFS-5333 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Haohui Mai > > This is an umbrella jira for improving the current JSP-based HDFS Web UI. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807555#comment-13807555 ] Larry McCay commented on HDFS-5333: --- Okay, I may be off base then. Are REST APIs being invoked from the Browser or not? If they are then they won't be able to get to the services. > Improvement of current HDFS Web UI > -- > > Key: HDFS-5333 > URL: https://issues.apache.org/jira/browse/HDFS-5333 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Haohui Mai > > This is an umbrella jira for improving the current JSP-based HDFS Web UI. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807554#comment-13807554 ] Haohui Mai commented on HDFS-5333: -- The server serves both the old and the new web UI at the exactly same HTTP / HTTPS port. You're accessing the JSP and the new Web UI through the same port, so I believe that this is a non-issue. > Improvement of current HDFS Web UI > -- > > Key: HDFS-5333 > URL: https://issues.apache.org/jira/browse/HDFS-5333 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Haohui Mai > > This is an umbrella jira for improving the current JSP-based HDFS Web UI. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807548#comment-13807548 ] Larry McCay commented on HDFS-5333: --- Well, I think it is important to consider that serverside code is executing within the cluster (other side of the firewall) and that it would have direct access to service endpoints. So, in that respect the old web UI will work - assuming that the port is open to get to it from the outside. In the new UI, the connections will be made from the client where it will need to go through the gateway to get the services. Unless I am missing something. > Improvement of current HDFS Web UI > -- > > Key: HDFS-5333 > URL: https://issues.apache.org/jira/browse/HDFS-5333 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Haohui Mai > > This is an umbrella jira for improving the current JSP-based HDFS Web UI. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807547#comment-13807547 ] Hudson commented on HDFS-4949: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4664 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4664/]) Merge HDFS-4949 branch back into trunk (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536572) * /hadoop/common/trunk * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BatchedRemoteIterator.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferUtil.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/HasEnhancedByteBufferAccess.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ReadOption.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ZeroCopyUnavailableException.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/permission/FsPermission.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/ByteBufferPool.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/ElasticByteBufferPool.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/Text.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/IdentityHashStore.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/IntrusiveCollection.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightCache.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringUtils.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/core * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/nativeio/TestNativeIO.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestIdentityHashStore.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightGSet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ClientMmap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ClientMmapManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/AddPathBasedCacheDirectiveException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LayoutVersion.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/PathBasedCacheDescriptor.java
[jira] [Updated] (HDFS-5320) Add datanode caching metrics
[ https://issues.apache.org/jira/browse/HDFS-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5320: --- Target Version/s: 3.0.0 (was: HDFS-4949) Affects Version/s: (was: HDFS-4949) 3.0.0 > Add datanode caching metrics > > > Key: HDFS-5320 > URL: https://issues.apache.org/jira/browse/HDFS-5320 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-5320-1.patch, hdfs-5320-2.patch > > > It'd be good to hook up datanode metrics for # (blocks/bytes) > (cached/uncached/failed to cache) over different time windows > (eternity/1hr/10min/1min). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin
[ https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5326: --- Target Version/s: 3.0.0 (was: HDFS-4949) Affects Version/s: 3.0.0 > add modifyDirective to cacheAdmin > - > > Key: HDFS-5326 > URL: https://issues.apache.org/jira/browse/HDFS-5326 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe > > We should add a way of modifying cache directives on the command-line, > similar to how modifyCachePool works. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5253) Add requesting user's name to PathBasedCacheEntry
[ https://issues.apache.org/jira/browse/HDFS-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5253: --- Affects Version/s: (was: HDFS-4949) > Add requesting user's name to PathBasedCacheEntry > - > > Key: HDFS-5253 > URL: https://issues.apache.org/jira/browse/HDFS-5253 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Andrew Wang >Assignee: Andrew Wang > > It'll be useful to have the requesting user's name in {{PathBasedCacheEntry}} > for tracking per-user statistics (e.g. amount of data cached by a user). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5394) fix race conditions in DN caching and uncaching
[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5394: --- Target Version/s: 3.0.0 (was: HDFS-4949) Affects Version/s: (was: HDFS-4949) 3.0.0 > fix race conditions in DN caching and uncaching > --- > > Key: HDFS-5394 > URL: https://issues.apache.org/jira/browse/HDFS-5394 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-5394-caching.001.patch, > HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, > HDFS-5394-caching.004.patch > > > The DN needs to handle situations where it is asked to cache the same replica > more than once. (Currently, it can actually do two mmaps and mlocks.) It > also needs to handle the situation where caching a replica is cancelled > before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
[ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5182: --- Target Version/s: 3.0.0 (was: HDFS-4949) Affects Version/s: (was: HDFS-4949) 3.0.0 > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid > - > > Key: HDFS-5182 > URL: https://issues.apache.org/jira/browse/HDFS-5182 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > > BlockReaderLocal must allow zero-copy reads only when the DN believes it's > valid. This implies adding a new field to the response to > REQUEST_SHORT_CIRCUIT_FDS. We also need some kind of heartbeat from the > client to the DN, so that the DN can inform the client when the mapped region > is no longer locked into memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807502#comment-13807502 ] Hadoop QA commented on HDFS-5438: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610696/HDFS-5438.trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery org.apache.hadoop.hdfs.server.namenode.TestCorruptFilesJsp {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5299//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5299//console This message is automatically generated. > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438-1.trunk.patch, HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807499#comment-13807499 ] Haohui Mai commented on HDFS-5333: -- Hi [~lmccay], Thanks for the input! This is complementary of the problem of web UI. I believe the old web UI does not work in the scenario you mentioned. The new web UI won't work for now either, as there are a few places that the code used absolute URLs. However, this can be easily fixed in the new web UI. > Improvement of current HDFS Web UI > -- > > Key: HDFS-5333 > URL: https://issues.apache.org/jira/browse/HDFS-5333 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Haohui Mai > > This is an umbrella jira for improving the current JSP-based HDFS Web UI. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807491#comment-13807491 ] Haohui Mai commented on HDFS-5436: -- Te planned support of HTTPS on hftp and webhdfs requires even more shared code. Putting all three filesystem in the same package allows us to limit the visibility of the codes that are only used in these systems. This refactor should improve the readability and the modularity of the implementation of hftp / hsftp / webhdfs. > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, > HDFS-5436.002.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807483#comment-13807483 ] Brandon Li commented on HDFS-5436: -- {quote}This force several methods in ByteInputStream and URLConnectionFactory to be public methods.{quote} The patch moves the Http access related classes from org.apache.hdfs into org.apache.hdfs.wb. Any other reasons justify the move besides the one you listed above? > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, > HDFS-5436.002.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5333) Improvement of current HDFS Web UI
[ https://issues.apache.org/jira/browse/HDFS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807481#comment-13807481 ] Larry McCay commented on HDFS-5333: --- Interesting work! It seems to me that we may need to consider deployments where a gateway such as Knox is between the UI client and the Hadoop cluster. How are the relevant URLs configured for the deployment - are they easily configured for a particular deployment scenario such as this? > Improvement of current HDFS Web UI > -- > > Key: HDFS-5333 > URL: https://issues.apache.org/jira/browse/HDFS-5333 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Haohui Mai > > This is an umbrella jira for improving the current JSP-based HDFS Web UI. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-2832: Attachment: h2832_20131028.patch > Enable support for heterogeneous storages in HDFS > - > > Key: HDFS-2832 > URL: https://issues.apache.org/jira/browse/HDFS-2832 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Attachments: 20130813-HeterogeneousStorage.pdf, > h2832_20131023b.patch, h2832_20131023.patch, h2832_20131025.patch, > h2832_20131028.patch > > > HDFS currently supports configuration where storages are a list of > directories. Typically each of these directories correspond to a volume with > its own file system. All these directories are homogeneous and therefore > identified as a single storage at the namenode. I propose, change to the > current model where Datanode * is a * storage, to Datanode * is a collection > * of strorages. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5437) TestBlockReport and TestBPOfferService fail due to test issues
[ https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5437: Attachment: h5437.05.patch Including a trivial fix in {{SimulatedFSDataset#getStorageReports}}. > TestBlockReport and TestBPOfferService fail due to test issues > -- > > Key: HDFS-5437 > URL: https://issues.apache.org/jira/browse/HDFS-5437 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: h5437.03.patch, h5437.04.patch, h5437.05.patch > > > There are a few more test issues in {{TestBlockReport}} caused by the earlier > changes. > {{testBlockReport_07}} fails and it looks like a test issue. > {code} > Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport > Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport > blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time > elapsed: 19.314 sec <<< FAILURE! > java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but > was:<0> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at > org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) > {code} > {{TestBPOfferService}} fails due to missing implementation of > {{SimulatedFSDataset#getStorageReports}}. > {code} > 2013-10-28 16:24:33,775 ERROR datanode.DataNode > (BPServiceActor.java:run(719)) - Exception in BPOfferService for Block pool > fake bpid (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:0 > java.lang.UnsupportedOperationException at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getStorageReports(SimulatedFSDataset.java:1005) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:478) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:566) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:717) > at java.lang.Thread.run(Thread.java:695) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5437) TestBlockReport and TestBPOfferService fail due to test issues
[ https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5437: Description: There are a few more test issues in {{TestBlockReport}} caused by the earlier changes. {{testBlockReport_07}} fails and it looks like a test issue. {code} Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time elapsed: 19.314 sec <<< FAILURE! java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) {code} {{TestBPOfferService}} fails due to missing implementation of {{SimulatedFSDataset#getStorageReports}}. {code} 2013-10-28 16:24:33,775 ERROR datanode.DataNode (BPServiceActor.java:run(719)) - Exception in BPOfferService for Block pool fake bpid (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:0 java.lang.UnsupportedOperationException at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getStorageReports(SimulatedFSDataset.java:1005) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:478) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:566) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:717) at java.lang.Thread.run(Thread.java:695) {code} was: There are a few more test issues in {{TestBlockReport}} caused by the earlier changes. {{testBlockReport_07}} fails and it looks like a test issue. {code} Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time elapsed: 19.314 sec <<< FAILURE! java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) {code} > TestBlockReport and TestBPOfferService fail due to test issues > -- > > Key: HDFS-5437 > URL: https://issues.apache.org/jira/browse/HDFS-5437 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: h5437.03.patch, h5437.04.patch > > > There are a few more test issues in {{TestBlockReport}} caused by the earlier > changes. > {{testBlockReport_07}} fails and it looks like a test issue. > {code} > Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport > Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport > blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time > elapsed: 19.314 sec <<< FAILURE! > java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but > was:<0> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at > org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) > {code} > {{TestBPOfferService}} fails due to missing implementation of > {{SimulatedFSDataset#getStorageReports}}. > {code} > 2013-10-28 16:24:33,775 ERROR datanode.DataNode > (BPServiceActor.java:run(719)) - Exception in BPOfferService for Block pool > fake bpid (Datanode Uuid null) service to 0.0.0.0/0.0.0.0:0 > java.lang.UnsupportedOperationException at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getStorageReports(SimulatedFSDataset.java:1005) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:478) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:566) > at > org.apache.hadoop.hdfs.server.datano
[jira] [Updated] (HDFS-5437) TestBlockReport and TestBPOfferService fail due to test issues
[ https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5437: Summary: TestBlockReport and TestBPOfferService fail due to test issues (was: TestBlockReport fails due to test issues) > TestBlockReport and TestBPOfferService fail due to test issues > -- > > Key: HDFS-5437 > URL: https://issues.apache.org/jira/browse/HDFS-5437 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: h5437.03.patch, h5437.04.patch > > > There are a few more test issues in {{TestBlockReport}} caused by the earlier > changes. > {{testBlockReport_07}} fails and it looks like a test issue. > {code} > Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport > Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport > blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time > elapsed: 19.314 sec <<< FAILURE! > java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but > was:<0> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at > org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Attachment: HDFS-5438-1.trunk.patch The new patch adds check for gen stamp in the case where the stored block state is UNDER_CONSTRUCTION and reported replica state is FINALIZED. > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438-1.trunk.patch, HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807446#comment-13807446 ] Hadoop QA commented on HDFS-5436: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610662/HDFS-5436.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-extras. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5295//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5295//console This message is automatically generated. > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, > HDFS-5436.002.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5437) TestBlockReport fails due to test issues
[ https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5437: Attachment: h5437.04.patch > TestBlockReport fails due to test issues > > > Key: HDFS-5437 > URL: https://issues.apache.org/jira/browse/HDFS-5437 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: h5437.03.patch, h5437.04.patch > > > There are a few more test issues in {{TestBlockReport}} caused by the earlier > changes. > {{testBlockReport_07}} fails and it looks like a test issue. > {code} > Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport > Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport > blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time > elapsed: 19.314 sec <<< FAILURE! > java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but > was:<0> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at > org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5252) Stable write is handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5252: - Description: When the client asks for a stable write but the prerequisite writes are not transferred to NFS gateway, the stableness can't be honored. NFS gateway has to treat the write as unstable write and set the flag to UNSTABLE in the write response. One bug was found during test with Ubuntu client when copying one 1KB file. For small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC flag). However, NFS gateway missed one place where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated to UNSTABLE. With this bug, the client thinks the write is on disk and thus doesn't send COMMIT anymore. The following test tries to read the data back and of course fails to do so since the data was not synced. was: When the client asks for a stable write but the prerequisite writes are not transferred to NFS gateway, the stableness can't be honored. NFS gateway has to treat the write as unstable write and set the flag to UNSTABLE in the write response. One bug was found during test with Ubuntu client when copying one 1KB file. For small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC flag). However, NFS gateway missed one place where it sends response with the flag NOT updated to UNSTABLE. With this bug, the client thinks the write is on disk and thus doesn't send COMMIT anymore. The following test tries to read the data back and of course fails to do so since the data was not synced. > Stable write is handled correctly in someplace > -- > > Key: HDFS-5252 > URL: https://issues.apache.org/jira/browse/HDFS-5252 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Reporter: Brandon Li > > When the client asks for a stable write but the prerequisite writes are not > transferred to NFS gateway, the stableness can't be honored. NFS gateway has > to treat the write as unstable write and set the flag to UNSTABLE in the > write response. > One bug was found during test with Ubuntu client when copying one 1KB file. > For small files like 1KB file, Ubuntu client does one stable write (with > FILE_SYNC flag). However, NFS gateway missed one place > where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated > to UNSTABLE. > With this bug, the client thinks the write is on disk and thus doesn't send > COMMIT anymore. The following test tries to read the data back and of course > fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5252) Stable write is handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5252: - Description: When the client asks for a stable write but the prerequisite writes are not transferred to NFS gateway, the stableness can't be honored. NFS gateway has to treat the write as unstable write and set the flag to UNSTABLE in the write response. One bug was found during test with Ubuntu client when copying one 1KB file. For small files like 1KB file, Ubuntu client does one stable write (with FILE_SYNC flag). However, NFS gateway missed one place where it sends response with the flag NOT updated to UNSTABLE. With this bug, the client thinks the write is on disk and thus doesn't send COMMIT anymore. The following test tries to read the data back and of course fails to do so since the data was not synced. was:When the client asks for a stable write but the prerequisite writes are not transferred to NFS gateway, the stableness can't be honored. NFS gateway has to treat the write as unstable write. > Stable write is handled correctly in someplace > -- > > Key: HDFS-5252 > URL: https://issues.apache.org/jira/browse/HDFS-5252 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Reporter: Brandon Li > > When the client asks for a stable write but the prerequisite writes are not > transferred to NFS gateway, the stableness can't be honored. NFS gateway has > to treat the write as unstable write and set the flag to UNSTABLE in the > write response. > One bug was found during test with Ubuntu client when copying one 1KB file. > For small files like 1KB file, Ubuntu client does one stable write (with > FILE_SYNC flag). However, NFS gateway missed one place where it sends > response with the flag NOT updated to UNSTABLE. > With this bug, the client thinks the write is on disk and thus doesn't send > COMMIT anymore. The following test tries to read the data back and of course > fails to do so since the data was not synced. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5252) Stable write is handled correctly in someplace
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5252: - Summary: Stable write is handled correctly in someplace (was: Stable write is handled correctly) > Stable write is handled correctly in someplace > -- > > Key: HDFS-5252 > URL: https://issues.apache.org/jira/browse/HDFS-5252 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Reporter: Brandon Li > > When the client asks for a stable write but the prerequisite writes are not > transferred to NFS gateway, the stableness can't be honored. NFS gateway has > to treat the write as unstable write. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5252) Stable write is handled correctly
[ https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5252: - Summary: Stable write is handled correctly (was: Do unstable write only when stable write can't be honored) > Stable write is handled correctly > - > > Key: HDFS-5252 > URL: https://issues.apache.org/jira/browse/HDFS-5252 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Reporter: Brandon Li > > When the client asks for a stable write but the prerequisite writes are not > transferred to NFS gateway, the stableness can't be honored. NFS gateway has > to treat the write as unstable write. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5440) Extract the logic of handling delegation tokens in HftpFileSystem to the TokenAspect class
Haohui Mai created HDFS-5440: Summary: Extract the logic of handling delegation tokens in HftpFileSystem to the TokenAspect class Key: HDFS-5440 URL: https://issues.apache.org/jira/browse/HDFS-5440 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai The logic of handling delegation token in HftpFileSystem and WebHdfsFileSystem are mostly identical. To simplify the code, this jira proposes to extract the common code into a new class named TokenAspect. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5439) Fix TestPendingReplications
[ https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5439: Assignee: (was: Arpit Agarwal) > Fix TestPendingReplications > --- > > Key: HDFS-5439 > URL: https://issues.apache.org/jira/browse/HDFS-5439 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Arpit Agarwal > > {{TestPendingReplication}} fails with the following exception: > {code} > java.lang.AssertionError: expected:<4> but was:<3> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at org.junit.Assert.assertEquals(Assert.java:456) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5439) Fix TestPendingReplications
[ https://issues.apache.org/jira/browse/HDFS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807418#comment-13807418 ] Arpit Agarwal commented on HDFS-5439: - The same issue appears to cause a failure in {{TestBlockReport#blockReport_07}}. > Fix TestPendingReplications > --- > > Key: HDFS-5439 > URL: https://issues.apache.org/jira/browse/HDFS-5439 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > > {{TestPendingReplication}} fails with the following exception: > {code} > java.lang.AssertionError: expected:<4> but was:<3> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at org.junit.Assert.assertEquals(Assert.java:456) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5437) TestBlockReport fails due to test issues
[ https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5437: Attachment: h5437.03.patch Some refactoring of the test case. Also added two {{@VisibleForTesting}} methods to {{BlockListAsLongs}}. {{TestBlockReport#blockReport_07}} will fail on a different assertion now due to HDFS-5439. > TestBlockReport fails due to test issues > > > Key: HDFS-5437 > URL: https://issues.apache.org/jira/browse/HDFS-5437 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: h5437.03.patch > > > There are a few more test issues in {{TestBlockReport}} caused by the earlier > changes. > {{testBlockReport_07}} fails and it looks like a test issue. > {code} > Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport > Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport > blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time > elapsed: 19.314 sec <<< FAILURE! > java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but > was:<0> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at > org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Attachment: HDFS-5438.trunk.patch > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Status: Patch Available (was: Open) > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 0.23.9 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Attachment: (was: HDFS-5438.trunk.patch) > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Status: Open (was: Patch Available) > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 0.23.9 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5439) Fix TestPendingReplications
Arpit Agarwal created HDFS-5439: --- Summary: Fix TestPendingReplications Key: HDFS-5439 URL: https://issues.apache.org/jira/browse/HDFS-5439 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal {{TestPendingReplication}} fails with the following exception: {code} java.lang.AssertionError: expected:<4> but was:<3> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication.testBlockReceived(TestPendingReplication.java:186) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807243#comment-13807243 ] Hadoop QA commented on HDFS-5438: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610678/HDFS-5438.trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5297//console This message is automatically generated. > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Description: The incremental block reports from data nodes and block commits are asynchronous. This becomes troublesome when the gen stamp for a block is changed during a write pipeline recovery. * If an incremental block report is delayed from a node but NN had enough replicas already, a report with the old gen stamp may be received after block completion. This replica will be correctly marked corrupt. But if the node had participated in the pipeline recovery, a new (delayed) report with the correct gen stamp will come soon. However, this report won't have any effect on the corrupt state of the replica. * If block reports are received while the block is still under construction (i.e. client's call to make block committed has not been received by NN), they are blindly accepted regardless of the gen stamp. If a failed node reports in with the old gen stamp while pipeline recovery is on-going, it will be accepted and counted as valid during commit of the block. Due to the above two problems, correct replicas can be marked corrupt and corrupt replicas can be accepted during commit. So far we have observed two cases in production. * The client hangs forever to close a file. All replicas are marked corrupt. * After the successful close of a file, read fails. Corrupt replicas are accepted during commit and valid replicas are marked corrupt afterward. was: The incremental block reports from data nodes and block commits are asynchronous. This becomes troublesome when the gen stamp for a block is changed during a write pipeline recovery. * If an incremental block report is delayed from a node but NN had enough replicas already, a report with the old gen stamp may be received after block completion. This replica will be correctly marked corrupt. But if the node had participated in the pipeline recovery, a new (delayed) report with the correct gen stamp will come soon. However, this report won't have any effect on the corrupt state of the replica. * If block reports are received while the block is still under construction (i.e. client's call to make block committed has not been received by NN), they are blindly accepted regardless of the gen stamp. If a failed node reports in with the old gen stamp while pipeline recovery is on-going, it will be accepted and counted as valid during commit of the block. Due to the above two problems, correct replicas can be marked corrupt and corrupt replicas can be accepted during commit. So far we have observed two cases in production. * The client hangs forever to close a file. All replicas are marked corrupt. * After the successful close of a file, read fails. Corrupt replicas are accepted them during commit and valid replicas are marked corrupt afterward. > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807241#comment-13807241 ] Hadoop QA commented on HDFS-5438: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610678/HDFS-5438.trunk.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5298//console This message is automatically generated. > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Attachment: (was: HDFS-5438.trunk.patch) > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Status: Open (was: Patch Available) Oops. The critical line is commented out for testing in the patch. > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 0.23.9 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch, HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Status: Patch Available (was: Open) > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 0.23.9 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch, HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Attachment: HDFS-5438.trunk.patch Reposting the patch. > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch, HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807225#comment-13807225 ] Kihwal Lee commented on HDFS-5438: -- This is a high-level description of what the patch does. * The patch makes NN save the list of already reported replicas when starting a pipeline recovery. If a new report with the new gen stamp is not received for the existing replica until the recovery is done, it will be marked corrupt. * If a block report is received for existing corrupt replica and it is no longer corrupt, NN will remove it from the corrupt replicas map. * If client cannot close a file because the block does not have enough number of valid replicas, it eventually gives up rather than hanging forever. It is already failing after a number of retries when adding a new block. It will use the same retry limit in compleFile(), but the timeout will double every time to make it try harder. With the default of 5 retries, a client will wait at least 4 minutes and give up. If NN is not responding, it may wait longer. > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5436: - Attachment: HDFS-5436.002.patch > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch, > HDFS-5436.002.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Attachment: HDFS-5438.trunk.patch > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned HDFS-5438: Assignee: Kihwal Lee > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.9, 2.2.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5438) Flaws in block report processing can cause data loss
[ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5438: - Status: Patch Available (was: Open) > Flaws in block report processing can cause data loss > > > Key: HDFS-5438 > URL: https://issues.apache.org/jira/browse/HDFS-5438 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 0.23.9 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-5438.trunk.patch > > > The incremental block reports from data nodes and block commits are > asynchronous. This becomes troublesome when the gen stamp for a block is > changed during a write pipeline recovery. > * If an incremental block report is delayed from a node but NN had enough > replicas already, a report with the old gen stamp may be received after block > completion. This replica will be correctly marked corrupt. But if the node > had participated in the pipeline recovery, a new (delayed) report with the > correct gen stamp will come soon. However, this report won't have any effect > on the corrupt state of the replica. > * If block reports are received while the block is still under construction > (i.e. client's call to make block committed has not been received by NN), > they are blindly accepted regardless of the gen stamp. If a failed node > reports in with the old gen stamp while pipeline recovery is on-going, it > will be accepted and counted as valid during commit of the block. > Due to the above two problems, correct replicas can be marked corrupt and > corrupt replicas can be accepted during commit. So far we have observed two > cases in production. > * The client hangs forever to close a file. All replicas are marked corrupt. > * After the successful close of a file, read fails. Corrupt replicas are > accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5438) Flaws in block report processing can cause data loss
Kihwal Lee created HDFS-5438: Summary: Flaws in block report processing can cause data loss Key: HDFS-5438 URL: https://issues.apache.org/jira/browse/HDFS-5438 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0, 0.23.9 Reporter: Kihwal Lee Priority: Critical The incremental block reports from data nodes and block commits are asynchronous. This becomes troublesome when the gen stamp for a block is changed during a write pipeline recovery. * If an incremental block report is delayed from a node but NN had enough replicas already, a report with the old gen stamp may be received after block completion. This replica will be correctly marked corrupt. But if the node had participated in the pipeline recovery, a new (delayed) report with the correct gen stamp will come soon. However, this report won't have any effect on the corrupt state of the replica. * If block reports are received while the block is still under construction (i.e. client's call to make block committed has not been received by NN), they are blindly accepted regardless of the gen stamp. If a failed node reports in with the old gen stamp while pipeline recovery is on-going, it will be accepted and counted as valid during commit of the block. Due to the above two problems, correct replicas can be marked corrupt and corrupt replicas can be accepted during commit. So far we have observed two cases in production. * The client hangs forever to close a file. All replicas are marked corrupt. * After the successful close of a file, read fails. Corrupt replicas are accepted them during commit and valid replicas are marked corrupt afterward. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5432) TestDatanodeJsp fails on Windows due to assumption that loopback address resolves to host name localhost.
[ https://issues.apache.org/jira/browse/HDFS-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807184#comment-13807184 ] Arpit Agarwal commented on HDFS-5432: - +1 for the updated patch also. > TestDatanodeJsp fails on Windows due to assumption that loopback address > resolves to host name localhost. > - > > Key: HDFS-5432 > URL: https://issues.apache.org/jira/browse/HDFS-5432 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 3.0.0, 2.2.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Trivial > Attachments: HDFS-5432.1.patch, HDFS-5432.2.patch > > > As discussed in many previous issues, Windows differs from Unixes in that it > does not resolve the loopback address to hostname "localhost". Instead, the > host name remains unresolved as "127.0.0.1". {{TestDatanodeJsp}} fails on > Windows, because it attempts to assert a string match containing "localhost" > as the host name. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807189#comment-13807189 ] Hadoop QA commented on HDFS-5436: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610614/HDFS-5436.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-extras: org.apache.hadoop.hdfs.security.TestDelegationToken {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5293//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5293//console This message is automatically generated. > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5432) TestDatanodeJsp fails on Windows due to assumption that loopback address resolves to host name localhost.
[ https://issues.apache.org/jira/browse/HDFS-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5432: Attachment: HDFS-5432.2.patch Thanks for the reviews, but I'm uploading a new version. I think we can avoid the Windows-specific conditional by pulling the correct host name out of the {{InetSocketAddress}}. I re-tested this successfully on Mac and Windows. Does this still look OK? > TestDatanodeJsp fails on Windows due to assumption that loopback address > resolves to host name localhost. > - > > Key: HDFS-5432 > URL: https://issues.apache.org/jira/browse/HDFS-5432 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 3.0.0, 2.2.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Trivial > Attachments: HDFS-5432.1.patch, HDFS-5432.2.patch > > > As discussed in many previous issues, Windows differs from Unixes in that it > does not resolve the loopback address to hostname "localhost". Instead, the > host name remains unresolved as "127.0.0.1". {{TestDatanodeJsp}} fails on > Windows, because it attempts to assert a string match containing "localhost" > as the host name. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5435) File append fails to initialize storageIDs
[ https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807153#comment-13807153 ] Junping Du commented on HDFS-5435: -- Thanks Arpit and Nicholas for quickly response and review! > File append fails to initialize storageIDs > -- > > Key: HDFS-5435 > URL: https://issues.apache.org/jira/browse/HDFS-5435 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Fix For: Heterogeneous Storage (HDFS-2832) > > Attachments: HDFS-5435.patch > > > Several NPE exceptions in append related operations is because forget setting > storageIDs in initiate DataStreamer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5437) TestBlockReport fails due to test issues
[ https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5437: Description: There are a few more test issues in {{TestBlockReport}} caused by the earlier changes. {{testBlockReport_07}} fails and it looks like a test issue. {code} Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time elapsed: 19.314 sec <<< FAILURE! java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) {code} was: There are a few more test failures in {{TestBlockReport}} caused by the earlier changes. {{testBlockReport_07}} fails and it looks like a test issue. {code} Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time elapsed: 19.314 sec <<< FAILURE! java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) {code} > TestBlockReport fails due to test issues > > > Key: HDFS-5437 > URL: https://issues.apache.org/jira/browse/HDFS-5437 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > > There are a few more test issues in {{TestBlockReport}} caused by the earlier > changes. > {{testBlockReport_07}} fails and it looks like a test issue. > {code} > Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport > Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport > blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time > elapsed: 19.314 sec <<< FAILURE! > java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but > was:<0> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at > org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5437) TestBlockReport fails due to test issues
[ https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5437: Description: There are a few more test failures in {{TestBlockReport}} caused by the earlier changes. {{testBlockReport_07}} fails and it looks like a test issue. {code} Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time elapsed: 19.314 sec <<< FAILURE! java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) {code} was:There are a few more test failures in TestBlockReport caused by the earlier changes. testBlockReport_07 fails and it looks like a test issue. > TestBlockReport fails due to test issues > > > Key: HDFS-5437 > URL: https://issues.apache.org/jira/browse/HDFS-5437 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > > There are a few more test failures in {{TestBlockReport}} caused by the > earlier changes. > {{testBlockReport_07}} fails and it looks like a test issue. > {code} > Running org.apache.hadoop.hdfs.server.datanode.TestBlockReport > Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 110.824 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.datanode.TestBlockReport > blockReport_07(org.apache.hadoop.hdfs.server.datanode.TestBlockReport) Time > elapsed: 19.314 sec <<< FAILURE! > java.lang.AssertionError: Wrong number of Corrupted blocks expected:<1> but > was:<0> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at > org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_07(TestBlockReport.java:461) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807135#comment-13807135 ] Chris Nauroth commented on HDFS-4949: - +1 for the merge. Thanks again, Andrew and Colin. > Centralized cache management in HDFS > > > Key: HDFS-4949 > URL: https://issues.apache.org/jira/browse/HDFS-4949 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 3.0.0, 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: caching-design-doc-2013-07-02.pdf, > caching-design-doc-2013-08-09.pdf, caching-design-doc-2013-10-24.pdf, > caching-testplan.pdf, HDFS-4949-consolidated.patch > > > HDFS currently has no support for managing or exposing in-memory caches at > datanodes. This makes it harder for higher level application frameworks like > Hive, Pig, and Impala to effectively use cluster memory, because they cannot > explicitly cache important datasets or place their tasks for memory locality. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5437) Fix TestBlockReport
Arpit Agarwal created HDFS-5437: --- Summary: Fix TestBlockReport Key: HDFS-5437 URL: https://issues.apache.org/jira/browse/HDFS-5437 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal There are a few more test failures in TestBlockReport caused by the earlier changes. testBlockReport_07 fails and it looks like a test issue. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5437) TestBlockReport fails due to test issues
[ https://issues.apache.org/jira/browse/HDFS-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5437: Summary: TestBlockReport fails due to test issues (was: Fix TestBlockReport) > TestBlockReport fails due to test issues > > > Key: HDFS-5437 > URL: https://issues.apache.org/jira/browse/HDFS-5437 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > > There are a few more test failures in TestBlockReport caused by the earlier > changes. testBlockReport_07 fails and it looks like a test issue. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5402) Deprecate the JSP web uis in HDFS
[ https://issues.apache.org/jira/browse/HDFS-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807085#comment-13807085 ] Haohui Mai commented on HDFS-5402: -- Just to clarify, the old and the new Web UIs can coexist. You can still access the old web UI using the same URLs until the JSPs are removed. > Deprecate the JSP web uis in HDFS > - > > Key: HDFS-5402 > URL: https://issues.apache.org/jira/browse/HDFS-5402 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai > > This JIRA tracks the discussion of transitioning from old, JSP web UIs to the > HTML 5 based web UIs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5402) Deprecate the JSP web uis in HDFS
[ https://issues.apache.org/jira/browse/HDFS-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807081#comment-13807081 ] Haohui Mai commented on HDFS-5402: -- bq. If we're going to remove the old web UI, I think the new web UI has to have the same level of unit testing. We shouldn't go backwards in terms of unit testing. I take a look at TestNamenodeJspHelper / TestDatanodeJspHelper / TestClusterJspHelper. It seems to me that we can merge these tests with the unit tests on JMX. bq. If we are going to remove this capability, we need to add some other command-line tools to get the same functionality. These tools could use REST if we have that, or JMX, but they need to exist before we can consider removing the old UI. This is a good point. Since all information are available through JMX, the easiest way to approach it is to write some scripts using Node.js. The architecture of the new Web UIs is ready for this. > Deprecate the JSP web uis in HDFS > - > > Key: HDFS-5402 > URL: https://issues.apache.org/jira/browse/HDFS-5402 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai > > This JIRA tracks the discussion of transitioning from old, JSP web UIs to the > HTML 5 based web UIs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5434) Write resiliency for replica count 1
[ https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807064#comment-13807064 ] Colin Patrick McCabe commented on HDFS-5434: Most users I am aware of who use replication factor 1 do it because they don't want the overhead of a multi-datanode pipeline that writes to multiple datanodes. If we give them such a pipeline anyway, it's contrary to what replication factor 1 has always meant. If your proposed solution works for you, there is a way to do it without modifying HDFS at all. Simply write with replication=2, and then call setReplication on the file after closing it. It seems like maybe your concern has more to do with how gracefully we handle pipeline failures (currently, not very gracefully). But that's a separate issue (see HDFS-4504 for details.) > Write resiliency for replica count 1 > > > Key: HDFS-5434 > URL: https://issues.apache.org/jira/browse/HDFS-5434 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Buddy >Priority: Minor > > If a file has a replica count of one, the HDFS client is exposed to write > failures if the data node fails during a write. With a pipeline of size of > one, no recovery is possible if the sole data node dies. > A simple fix is to force a minimum pipeline size of 2, while leaving the > replication count as 1. The implementation for this is fairly non-invasive. > Although the replica count is one, the block will be written to two data > nodes instead of one. If one of the data nodes fails during the write, normal > pipeline recovery will ensure that the write succeeds to the surviving data > node. > The existing code in the name node will prune the extra replica when it > receives the block received reports for the finalized block from both data > nodes. This results in the intended replica count of one for the block. > This behavior should be controlled by a configuration option such as > {{dfs.namenode.minPipelineSize}}. > This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by > ensuring that the pipeline size passed to > {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: > {code} > max(replication, ${dfs.namenode.minPipelineSize}) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5436: - Attachment: HDFS-5436.001.patch > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch, HDFS-5436.001.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5402) Deprecate the JSP web uis in HDFS
[ https://issues.apache.org/jira/browse/HDFS-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807045#comment-13807045 ] Colin Patrick McCabe commented on HDFS-5402: This is a really interesting project, Haohui. I think it will make our web UI much nicer. I have a few concerns about removing the old web UI, however: * If we're going to remove the old web UI, I think the new web UI has to have the same level of unit testing. We shouldn't go backwards in terms of unit testing. * Most of the deployments of elinks and links out there don't support Javascript. This is just a reality of life when using CentOS 5 or 6, which many users are still using. I have used "links" to diagnose problems through the web UI in the past, in systems where access to the cluster was available only through telnet. If we are going to remove this capability, we need to add some other command-line tools to get the same functionality. These tools could use REST if we have that, or JMX, but they need to exist before we can consider removing the old UI. > Deprecate the JSP web uis in HDFS > - > > Key: HDFS-5402 > URL: https://issues.apache.org/jira/browse/HDFS-5402 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai > > This JIRA tracks the discussion of transitioning from old, JSP web UIs to the > HTML 5 based web UIs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5435) File append fails to initialize storageIDs
[ https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5435: - Hadoop Flags: Reviewed +1 patch looks good. > File append fails to initialize storageIDs > -- > > Key: HDFS-5435 > URL: https://issues.apache.org/jira/browse/HDFS-5435 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Fix For: Heterogeneous Storage (HDFS-2832) > > Attachments: HDFS-5435.patch > > > Several NPE exceptions in append related operations is because forget setting > storageIDs in initiate DataStreamer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5413) hdfs.cmd does not support passthrough to any arbitrary class.
[ https://issues.apache.org/jira/browse/HDFS-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807026#comment-13807026 ] Hudson commented on HDFS-5413: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4662 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4662/]) HDFS-5413. hdfs.cmd does not support passthrough to any arbitrary class. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536448) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs.cmd > hdfs.cmd does not support passthrough to any arbitrary class. > - > > Key: HDFS-5413 > URL: https://issues.apache.org/jira/browse/HDFS-5413 > Project: Hadoop HDFS > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0, 2.2.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 3.0.0, 2.2.1 > > Attachments: HDFS-5413.1.patch, HDFS-5413.2.patch > > > The hdfs shell script supports passthrough to calling any arbitrary class if > the first argument is not one of the per-defined sub-commands. The > equivalent cmd script does not implement this and instead fails trying to do > a labeled goto to the first argument. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807018#comment-13807018 ] Hadoop QA commented on HDFS-5436: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610601/HDFS-5436.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5292//console This message is automatically generated. > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5413) hdfs.cmd does not support passthrough to any arbitrary class.
[ https://issues.apache.org/jira/browse/HDFS-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5413: Resolution: Fixed Fix Version/s: 2.2.1 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this patch to trunk, branch-2, and branch-2.2. Thank you to Chuan and Arpit for the code reviews. > hdfs.cmd does not support passthrough to any arbitrary class. > - > > Key: HDFS-5413 > URL: https://issues.apache.org/jira/browse/HDFS-5413 > Project: Hadoop HDFS > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0, 2.2.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 3.0.0, 2.2.1 > > Attachments: HDFS-5413.1.patch, HDFS-5413.2.patch > > > The hdfs shell script supports passthrough to calling any arbitrary class if > the first argument is not one of the per-defined sub-commands. The > equivalent cmd script does not implement this and instead fails trying to do > a labeled goto to the first argument. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5432) TestDatanodeJsp fails on Windows due to assumption that loopback address resolves to host name localhost.
[ https://issues.apache.org/jira/browse/HDFS-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807004#comment-13807004 ] Arpit Agarwal commented on HDFS-5432: - +1 for the patch. Verified results with and without your patch on Windows. > TestDatanodeJsp fails on Windows due to assumption that loopback address > resolves to host name localhost. > - > > Key: HDFS-5432 > URL: https://issues.apache.org/jira/browse/HDFS-5432 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, test >Affects Versions: 3.0.0, 2.2.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Trivial > Attachments: HDFS-5432.1.patch > > > As discussed in many previous issues, Windows differs from Unixes in that it > does not resolve the loopback address to hostname "localhost". Instead, the > host name remains unresolved as "127.0.0.1". {{TestDatanodeJsp}} fails on > Windows, because it attempts to assert a string match containing "localhost" > as the host name. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5436: - Attachment: (was: HDFS-5436.000.patch) > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5436: - Attachment: HDFS-5436.000.patch There's no functionality changes in this patch. > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5436: - Status: Patch Available (was: Open) > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
[ https://issues.apache.org/jira/browse/HDFS-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5436: - Attachment: HDFS-5436.000.patch > Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web > > > Key: HDFS-5436 > URL: https://issues.apache.org/jira/browse/HDFS-5436 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5436.000.patch > > > Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in > different packages. This force several methods in ByteInputStream and > URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5436) Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
Haohui Mai created HDFS-5436: Summary: Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web Key: HDFS-5436 URL: https://issues.apache.org/jira/browse/HDFS-5436 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Currently HsftpFilesystem, HftpFileSystem and WebHdfsFileSystem reside in different packages. This force several methods in ByteInputStream and URLConnectionFactory to be public methods. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5435) File append fails to initialize storageIDs
[ https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5435: Release Note: (was: Thanks for the patch Junping. I have committed it to branch HDFS-2832.) > File append fails to initialize storageIDs > -- > > Key: HDFS-5435 > URL: https://issues.apache.org/jira/browse/HDFS-5435 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Fix For: Heterogeneous Storage (HDFS-2832) > > Attachments: HDFS-5435.patch > > > Several NPE exceptions in append related operations is because forget setting > storageIDs in initiate DataStreamer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-5435) File append fails to initialize storageIDs
[ https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal resolved HDFS-5435. - Resolution: Fixed Fix Version/s: Heterogeneous Storage (HDFS-2832) Release Note: Thanks for the patch Junping. I have committed it to branch HDFS-2832. > File append fails to initialize storageIDs > -- > > Key: HDFS-5435 > URL: https://issues.apache.org/jira/browse/HDFS-5435 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Fix For: Heterogeneous Storage (HDFS-2832) > > Attachments: HDFS-5435.patch > > > Several NPE exceptions in append related operations is because forget setting > storageIDs in initiate DataStreamer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5435) File append fails to initialize storageIDs
[ https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806968#comment-13806968 ] Arpit Agarwal commented on HDFS-5435: - Thanks for the patch Junping. I have committed it to branch HDFS-2832. > File append fails to initialize storageIDs > -- > > Key: HDFS-5435 > URL: https://issues.apache.org/jira/browse/HDFS-5435 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Fix For: Heterogeneous Storage (HDFS-2832) > > Attachments: HDFS-5435.patch > > > Several NPE exceptions in append related operations is because forget setting > storageIDs in initiate DataStreamer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5435) File append fails to initialize storageIDs
[ https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5435: Summary: File append fails to initialize storageIDs (was: Fix file append without setting storageIDs) > File append fails to initialize storageIDs > -- > > Key: HDFS-5435 > URL: https://issues.apache.org/jira/browse/HDFS-5435 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Attachments: HDFS-5435.patch > > > Several NPE exceptions in append related operations is because forget setting > storageIDs in initiate DataStreamer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5435) Fix file append without setting storageIDs
[ https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806959#comment-13806959 ] Arpit Agarwal commented on HDFS-5435: - +1 for the patch, I will commit it shortly. > Fix file append without setting storageIDs > -- > > Key: HDFS-5435 > URL: https://issues.apache.org/jira/browse/HDFS-5435 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Attachments: HDFS-5435.patch > > > Several NPE exceptions in append related operations is because forget setting > storageIDs in initiate DataStreamer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5435) Fix file append without setting storageIDs
[ https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806923#comment-13806923 ] Junping Du commented on HDFS-5435: -- Verify it fix several failures in append related unit test. > Fix file append without setting storageIDs > -- > > Key: HDFS-5435 > URL: https://issues.apache.org/jira/browse/HDFS-5435 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Attachments: HDFS-5435.patch > > > Several NPE exceptions in append related operations is because forget setting > storageIDs in initiate DataStreamer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5435) Fix file append without setting storageIDs
[ https://issues.apache.org/jira/browse/HDFS-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-5435: - Attachment: HDFS-5435.patch > Fix file append without setting storageIDs > -- > > Key: HDFS-5435 > URL: https://issues.apache.org/jira/browse/HDFS-5435 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Heterogeneous Storage (HDFS-2832) >Reporter: Junping Du >Assignee: Junping Du > Attachments: HDFS-5435.patch > > > Several NPE exceptions in append related operations is because forget setting > storageIDs in initiate DataStreamer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5435) Fix file append without setting storageIDs
Junping Du created HDFS-5435: Summary: Fix file append without setting storageIDs Key: HDFS-5435 URL: https://issues.apache.org/jira/browse/HDFS-5435 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Junping Du Assignee: Junping Du Several NPE exceptions in append related operations is because forget setting storageIDs in initiate DataStreamer. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5434) Write resiliency for replica count 1
[ https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Sirianni updated HDFS-5434: Description: If a file has a replica count of one, the HDFS client is exposed to write failures if the data node fails during a write. With a pipeline of size of one, no recovery is possible if the sole data node dies. A simple fix is to force a minimum pipeline size of 2, while leaving the replication count as 1. The implementation for this is fairly non-invasive. Although the replica count is one, the block will be written to two data nodes instead of one. If one of the data nodes fails during the write, normal pipeline recovery will ensure that the write succeeds to the other data node. The existing code in the name node will prune the extra replica when it receives the block received reports for the finalized block from both data nodes. This results in the intended replica count of one for the block. This behavior should be controlled by a configuration option such as {{dfs.namenode.minPipelineSize}}. This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by ensuring that the pipeline size passed to {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: {code} max(replication, ${dfs.namenode.minPipelineSize}) {code} was: If a file has a replica count of one, the HDFS client is exposed to write failures if the data node fails during a write. With a pipeline of size of one, no recovery is possible if the sole data node dies. A simple fix is to force a minimum pipeline size of 2, while leaving the replication count as 1. The implementation for this is fairly non-invasive. Although the replica count is one, the block will be written to two data nodes instead of one. If one of the data nodes fails during the write, normal pipeline recovery will ensure that the write succeeds to the other data node. The existing code in the name node will prune the extra replica when it receives the block received reports for the finalized block from both data nodes. This results in the intended replica count of one for the block. This behavior should be controlled by a configuration option such as {{dfs.namenode.minPipelineSize}}. This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by ensuring that the pipeline size passed to {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: {code} max(replication, ${dfs.namenode.minPipelineSize}) {code} > Write resiliency for replica count 1 > > > Key: HDFS-5434 > URL: https://issues.apache.org/jira/browse/HDFS-5434 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Buddy >Priority: Minor > > If a file has a replica count of one, the HDFS client is exposed to write > failures if the data node fails during a write. With a pipeline of size of > one, no recovery is possible if the sole data node dies. > A simple fix is to force a minimum pipeline size of 2, while leaving the > replication count as 1. The implementation for this is fairly non-invasive. > Although the replica count is one, the block will be written to two data > nodes instead of one. If one of the data nodes fails during the write, normal > pipeline recovery will ensure that the write succeeds to the other data node. > The existing code in the name node will prune the extra replica when it > receives the block received reports for the finalized block from both data > nodes. This results in the intended replica count of one for the block. > This behavior should be controlled by a configuration option such as > {{dfs.namenode.minPipelineSize}}. > This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by > ensuring that the pipeline size passed to > {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: > {code} > max(replication, ${dfs.namenode.minPipelineSize}) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5434) Write resiliency for replica count 1
[ https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Sirianni updated HDFS-5434: Description: If a file has a replica count of one, the HDFS client is exposed to write failures if the data node fails during a write. With a pipeline of size of one, no recovery is possible if the sole data node dies. A simple fix is to force a minimum pipeline size of 2, while leaving the replication count as 1. The implementation for this is fairly non-invasive. Although the replica count is one, the block will be written to two data nodes instead of one. If one of the data nodes fails during the write, normal pipeline recovery will ensure that the write succeeds to the surviving data node. The existing code in the name node will prune the extra replica when it receives the block received reports for the finalized block from both data nodes. This results in the intended replica count of one for the block. This behavior should be controlled by a configuration option such as {{dfs.namenode.minPipelineSize}}. This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by ensuring that the pipeline size passed to {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: {code} max(replication, ${dfs.namenode.minPipelineSize}) {code} was: If a file has a replica count of one, the HDFS client is exposed to write failures if the data node fails during a write. With a pipeline of size of one, no recovery is possible if the sole data node dies. A simple fix is to force a minimum pipeline size of 2, while leaving the replication count as 1. The implementation for this is fairly non-invasive. Although the replica count is one, the block will be written to two data nodes instead of one. If one of the data nodes fails during the write, normal pipeline recovery will ensure that the write succeeds to the other data node. The existing code in the name node will prune the extra replica when it receives the block received reports for the finalized block from both data nodes. This results in the intended replica count of one for the block. This behavior should be controlled by a configuration option such as {{dfs.namenode.minPipelineSize}}. This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by ensuring that the pipeline size passed to {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: {code} max(replication, ${dfs.namenode.minPipelineSize}) {code} > Write resiliency for replica count 1 > > > Key: HDFS-5434 > URL: https://issues.apache.org/jira/browse/HDFS-5434 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Buddy >Priority: Minor > > If a file has a replica count of one, the HDFS client is exposed to write > failures if the data node fails during a write. With a pipeline of size of > one, no recovery is possible if the sole data node dies. > A simple fix is to force a minimum pipeline size of 2, while leaving the > replication count as 1. The implementation for this is fairly non-invasive. > Although the replica count is one, the block will be written to two data > nodes instead of one. If one of the data nodes fails during the write, normal > pipeline recovery will ensure that the write succeeds to the surviving data > node. > The existing code in the name node will prune the extra replica when it > receives the block received reports for the finalized block from both data > nodes. This results in the intended replica count of one for the block. > This behavior should be controlled by a configuration option such as > {{dfs.namenode.minPipelineSize}}. > This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by > ensuring that the pipeline size passed to > {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: > {code} > max(replication, ${dfs.namenode.minPipelineSize}) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5434) Write resiliency for replica count 1
[ https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Sirianni updated HDFS-5434: Description: If a file has a replica count of one, the HDFS client is exposed to write failures if the data node fails during a write. With a pipeline of size of one, no recovery is possible if the sole data node dies. A simple fix is to force a minimum pipeline size of 2, while leaving the replication count as 1. The implementation for this is fairly non-invasive. Although the replica count is one, the block will be written to two data nodes instead of one. If one of the data nodes fails during the write, normal pipeline recovery will ensure that the write succeeds to the other data node. The existing code in the name node will prune the extra replica when it receives the block received reports for the finalized block from both data nodes. This results in the intended replica count of one for the block. This behavior should be controlled by a configuration option such as {{dfs.namenode.minPipelineSize}}. This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by ensuring that the pipeline size passed to {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: {code} max(replication, ${dfs.namenode.minPipelineSize}) {code} was: If a file has a replica count of one, the HDFS client is exposed to write failures if the data node fails during a write. With a pipeline of size of one, no recovery is possible if the sole data node dies. A simple fix is to force a minimum pipeline size of 2, while leaving the replication count as 1. The implementation for this is fairly non-invasive. Although the replica count is one, the block will be written to two data nodes instead of one. If one of the data nodes fails during the write, normal pipeline recovery will ensure that the write succeeds to the other data node. The existing code in the name node will prune the extra replica when it receives the block received reports for the finalized block from both data nodes. This results in the intended replica count of one for the block. This behavior should be controlled by a configuration option such as dfs.namenode.minPipelineSize. This behavior can be implemented in FSNameSystem.getAdditionalBlock by ensuring that the pipeline size passed to BlockPlacementPolicy.chooseTarget in the replication parameter is at least: {code:java} max(replication, ${dfs.namenode.minPipelineSize}) {code} > Write resiliency for replica count 1 > > > Key: HDFS-5434 > URL: https://issues.apache.org/jira/browse/HDFS-5434 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Buddy >Priority: Minor > > If a file has a replica count of one, the HDFS client is exposed to write > failures if the data node fails during a write. With a pipeline of size of > one, no recovery is possible if the sole data node dies. > A simple fix is to force a minimum pipeline size of 2, while leaving the > replication count as 1. The implementation for this is fairly non-invasive. > Although the replica count is one, the block will be written to two data > nodes instead of one. If one of the data nodes fails during the write, normal > pipeline recovery will ensure that the write succeeds to the other data node. > The existing code in the name node will prune the extra replica when it > receives the block received reports for the finalized block from both data > nodes. This results in the intended replica count of one for the block. > This behavior should be controlled by a configuration option such as > {{dfs.namenode.minPipelineSize}}. > This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by > ensuring that the pipeline size passed to > {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is: > {code} > max(replication, ${dfs.namenode.minPipelineSize}) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5434) Write resiliency for replica count 1
[ https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Buddy updated HDFS-5434: Description: If a file has a replica count of one, the HDFS client is exposed to write failures if the data node fails during a write. With a pipeline of size of one, no recovery is possible if the sole data node dies. A simple fix is to force a minimum pipeline size of 2, while leaving the replication count as 1. The implementation for this is fairly non-invasive. Although the replica count is one, the block will be written to two data nodes instead of one. If one of the data nodes fails during the write, normal pipeline recovery will ensure that the write succeeds to the other data node. The existing code in the name node will prune the extra replica when it receives the block received reports for the finalized block from both data nodes. This results in the intended replica count of one for the block. This behavior should be controlled by a configuration option such as dfs.namenode.minPipelineSize. This behavior can be implemented in FSNameSystem.getAdditionalBlock by ensuring that the pipeline size passed to BlockPlacementPolicy.chooseTarget in the replication parameter is at least: {code:java} max(replication, ${dfs.namenode.minPipelineSize}) {code} was: If a file has a replica count of one, the HDFS client is exposed to write failures if the data node fails during a write. With a pipeline of size of one, no recovery is possible if the sole data node dies. A simple fix is to force a minimum pipeline size of 2, while leaving the replication count as 1. The implementation for this is fairly non-invasive. Although the replica count is one, the block will be written to two data nodes instead of one. If one of the data nodes fails during the write, normal pipeline recovery will ensure that the write succeeds to the other data node. The existing code in the name node will prune the extra replica when it receives the block received reports for the finalized block from both data nodes. This results in the intended replica count of one for the block. This behavior should be controlled by a configuration option such as dfs.namenode.minPipelineSize. This behavior can be implemented in FSNameSystem.getAdditionalBlock by ensuring that the pipeline size passed to BlockPlacementPolicy.chooseTarget in the replication parameter is at least: max(replication, ${dfs.namenode.minPipelineSize}) > Write resiliency for replica count 1 > > > Key: HDFS-5434 > URL: https://issues.apache.org/jira/browse/HDFS-5434 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0 >Reporter: Buddy >Priority: Minor > > If a file has a replica count of one, the HDFS client is exposed to write > failures if the data node fails during a write. With a pipeline of size of > one, no recovery is possible if the sole data node dies. > A simple fix is to force a minimum pipeline size of 2, while leaving the > replication count as 1. The implementation for this is fairly non-invasive. > Although the replica count is one, the block will be written to two data > nodes instead of one. If one of the data nodes fails during the write, normal > pipeline recovery will ensure that the write succeeds to the other data node. > The existing code in the name node will prune the extra replica when it > receives the block received reports for the finalized block from both data > nodes. This results in the intended replica count of one for the block. > This behavior should be controlled by a configuration option such as > dfs.namenode.minPipelineSize. > This behavior can be implemented in FSNameSystem.getAdditionalBlock by > ensuring that the pipeline size passed to BlockPlacementPolicy.chooseTarget > in the replication parameter is at least: > {code:java} > max(replication, ${dfs.namenode.minPipelineSize}) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5434) Write resiliency for replica count 1
Buddy created HDFS-5434: --- Summary: Write resiliency for replica count 1 Key: HDFS-5434 URL: https://issues.apache.org/jira/browse/HDFS-5434 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Buddy Priority: Minor If a file has a replica count of one, the HDFS client is exposed to write failures if the data node fails during a write. With a pipeline of size of one, no recovery is possible if the sole data node dies. A simple fix is to force a minimum pipeline size of 2, while leaving the replication count as 1. The implementation for this is fairly non-invasive. Although the replica count is one, the block will be written to two data nodes instead of one. If one of the data nodes fails during the write, normal pipeline recovery will ensure that the write succeeds to the other data node. The existing code in the name node will prune the extra replica when it receives the block received reports for the finalized block from both data nodes. This results in the intended replica count of one for the block. This behavior should be controlled by a configuration option such as dfs.namenode.minPipelineSize. This behavior can be implemented in FSNameSystem.getAdditionalBlock by ensuring that the pipeline size passed to BlockPlacementPolicy.chooseTarget in the replication parameter is at least: max(replication, ${dfs.namenode.minPipelineSize}) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5215) dfs.datanode.du.reserved is not taking effect as it's not considered while getting the available space
[ https://issues.apache.org/jira/browse/HDFS-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806756#comment-13806756 ] Hadoop QA commented on HDFS-5215: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610538/HDFS-5215.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5291//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5291//console This message is automatically generated. > dfs.datanode.du.reserved is not taking effect as it's not considered while > getting the available space > -- > > Key: HDFS-5215 > URL: https://issues.apache.org/jira/browse/HDFS-5215 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 3.0.0 > > Attachments: HDFS-5215.patch > > > {code}public long getAvailable() throws IOException { > long remaining = getCapacity()-getDfsUsed(); > long available = usage.getAvailable(); > if (remaining > available) { > remaining = available; > } > return (remaining > 0) ? remaining : 0; > } > {code} > Here we are not considering the reserved space while getting the Available > Space. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5215) dfs.datanode.du.reserved is not taking effect as it's not considered while getting the available space
[ https://issues.apache.org/jira/browse/HDFS-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-5215: --- Attachment: HDFS-5215.patch > dfs.datanode.du.reserved is not taking effect as it's not considered while > getting the available space > -- > > Key: HDFS-5215 > URL: https://issues.apache.org/jira/browse/HDFS-5215 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 3.0.0 > > Attachments: HDFS-5215.patch > > > {code}public long getAvailable() throws IOException { > long remaining = getCapacity()-getDfsUsed(); > long available = usage.getAvailable(); > if (remaining > available) { > remaining = available; > } > return (remaining > 0) ? remaining : 0; > } > {code} > Here we are not considering the reserved space while getting the Available > Space. -- This message was sent by Atlassian JIRA (v6.1#6144)