[jira] Updated: (HDFS-1104) Fsck triggers full GC on NameNode
[ https://issues.apache.org/jira/browse/HDFS-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1104: - Hadoop Flags: [Reviewed] +1 patch looks good. Thanks, Hairong. > Fsck triggers full GC on NameNode > - > > Key: HDFS-1104 > URL: https://issues.apache.org/jira/browse/HDFS-1104 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: fsckATime.patch, fsckATime1.patch, fsckATime2.patch > > > A NameNode at one of our clusters fell into full GC while fsck was performed. > Digging into the problem shows that it is caused by how NameNode handles the > access time of a file. > Fsck calls open on every file in the checked directory to get the file's > block locations. Each open changes the file's access time and then leads to > writing a transaction entry to the edit log. The current code optimizes open > so that it returns without issuing synchronizing the edit log to the disk. It > happened that in our cluster no other jobs were running while fsck was > performed. No edit log sync was ever called. So all open transactions were > kept in memory. When the edit log buffer got full, it automatically doubled > its space by allocating a new buffer. Full GC happened when no contiguous > space were found when allocating a new bigger buffer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-609) Create a file with the append flag does not work in HDFS
[ https://issues.apache.org/jira/browse/HDFS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862544#action_12862544 ] Todd Lipcon commented on HDFS-609: -- I disagree - I don't think these are addressed in trunk. #1) the APPEND flag seems to track through to startFileInternal in FSNamesystem, which as Hairong mentioned just converts the INode but does not properly pass back a LocatedBlock for the last block, or convert it to underconstruction status. #2) There still doesn't seem to be any checks that prevent a user from passing blocksize or replication when CreateFlag.APPEND is specified > Create a file with the append flag does not work in HDFS > > > Key: HDFS-609 > URL: https://issues.apache.org/jira/browse/HDFS-609 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Hairong Kuang >Priority: Blocker > Fix For: 0.21.0 > > > HADOOP-5438 introduced a create API with flags. There are a couple of issues > when the flag is set to be APPEND. > 1. The APPEND flag does not work in HDFS. Append is not as simple as changing > a FileINode to be a FileINodeUnderConstruction. It also need to reopen the > last block for applend if last block is not full and handle crc when the last > crc chunk is not full. > 2. The API is not well thought. It has parameters like replication factor and > blockSize. Those parameters do not make any sense if APPEND flag is set. But > they give an application user a wrong impression that append could change a > file's block size and replication factor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1123) Need HDFS Protocol Specification
[ https://issues.apache.org/jira/browse/HDFS-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862539#action_12862539 ] Todd Lipcon commented on HDFS-1123: --- Absolutely agree, we should document semantics. I guess my suggestion is that we do the two tasks separately - in the short term do a quick brushup of what we've got now, and in parallel start working on documentation of the semantics, etc. > Need HDFS Protocol Specification > > > Key: HDFS-1123 > URL: https://issues.apache.org/jira/browse/HDFS-1123 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: bc Wong > > It'd be great to document (in a spec, not in the code) the HDFS wire protocol: > * The layout of the different request and reply messages. > * The semantics of the various calls. > * The semantics of the various fields. > For example, I stumbled upon the goldmine of comments around > DataNode.java:1150. It looks correct, but the version number of 9 doesn't > inspire confidence that it's up-to-date. (It's also a random place to put > such an important comment.) > Having a formal spec is a big step forward for compatibility. It also > highlights design decisions and helps with protocol evolution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1123) Need HDFS Protocol Specification
[ https://issues.apache.org/jira/browse/HDFS-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862538#action_12862538 ] bc Wong commented on HDFS-1123: --- Doesn't it still apply after moving to Avro? Avroization makes the layout documentation easier. It doesn't describes the semantics of the protocol, which is the interesting part. > Need HDFS Protocol Specification > > > Key: HDFS-1123 > URL: https://issues.apache.org/jira/browse/HDFS-1123 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: bc Wong > > It'd be great to document (in a spec, not in the code) the HDFS wire protocol: > * The layout of the different request and reply messages. > * The semantics of the various calls. > * The semantics of the various fields. > For example, I stumbled upon the goldmine of comments around > DataNode.java:1150. It looks correct, but the version number of 9 doesn't > inspire confidence that it's up-to-date. (It's also a random place to put > such an important comment.) > Having a formal spec is a big step forward for compatibility. It also > highlights design decisions and helps with protocol evolution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1123) Need HDFS Protocol Specification
[ https://issues.apache.org/jira/browse/HDFS-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862535#action_12862535 ] Todd Lipcon commented on HDFS-1123: --- I agree that we should do a better job of documenting the current protocol, but we shouldn't spend *too* much time on it, since everyone is in agreement that we'd like to move to Avro this year. A quick pass to update the comments is probably worth doing, but a formal spec may be overkill for a protocol we plan to deprecate imminently. > Need HDFS Protocol Specification > > > Key: HDFS-1123 > URL: https://issues.apache.org/jira/browse/HDFS-1123 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: bc Wong > > It'd be great to document (in a spec, not in the code) the HDFS wire protocol: > * The layout of the different request and reply messages. > * The semantics of the various calls. > * The semantics of the various fields. > For example, I stumbled upon the goldmine of comments around > DataNode.java:1150. It looks correct, but the version number of 9 doesn't > inspire confidence that it's up-to-date. (It's also a random place to put > such an important comment.) > Having a formal spec is a big step forward for compatibility. It also > highlights design decisions and helps with protocol evolution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1122) client block verification may result in blocks in DataBlockScanner prematurely
[ https://issues.apache.org/jira/browse/HDFS-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1122: --- Attachment: hdfs-1122-for-0.20.txt patch that works on 0.20 > client block verification may result in blocks in DataBlockScanner prematurely > -- > > Key: HDFS-1122 > URL: https://issues.apache.org/jira/browse/HDFS-1122 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: sam rash >Assignee: sam rash > Attachments: hdfs-1122-for-0.20.txt > > > found that when the DN uses client verification of a block that is open for > writing, it will add it to the DataBlockScanner prematurely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862509#action_12862509 ] Konstantin Shvachko commented on HDFS-1114: --- Do you have an estimate on how much space this will save in NN's memory footprint? > Reducing NameNode memory usage by an alternate hash table > - > > Key: HDFS-1114 > URL: https://issues.apache.org/jira/browse/HDFS-1114 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > > NameNode uses a java.util.HashMap to store BlockInfo objects. When there are > many blocks in HDFS, this map uses a lot of memory in the NameNode. We may > optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1110) Namenode heap optimization - reuse objects for commonly used file names
[ https://issues.apache.org/jira/browse/HDFS-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1110: -- Attachment: hdfs-1110.2.patch bq. What are the names of these 24 files? Do they fall under the proposed default pattern. How big is the noise if we use the default pattern. Of 24, 22 are part-* files. bq. we need to optimize only for the top ten (or so) file names, which will give us 5% saving in the meta-data memory footprint I do not think top 10 will save 5% of meta-data memory fooprint. See the posted results below. I have a bug in my previous calculation, that made the savings seem too good to be true. With 47 million files optimized to use the dictionary, the saving of 10 bytes gives 470MB and not 4.7GB :-) Also I did not account for byte[] overhead of 24 bytes. Any way I have a tool NamespaceDedupe with the new patch. You could run on fsimage to see the frequency of occurence and savings in heap size. Dhruba you can run this on images on your production cluster to see how savings compare with what I have posted below. 23 names are used by 3343781 between 10 and 360461 times. Saved space 114962311 468 names are used by 12944154 between 1 and 10 times. Saved space 448255164 4335 names are used by 10522601 between 1000 and 1 times. Saved space 391364352 40031 names are used by 10654372 between 100 and 1000 times. Saved space 382273386 403974 names are used by10722689 between 10 and 100 times. Saved space 354416484 Total saved space 1691271697 > Namenode heap optimization - reuse objects for commonly used file names > --- > > Key: HDFS-1110 > URL: https://issues.apache.org/jira/browse/HDFS-1110 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: hdfs-1110.2.patch, hdfs-1110.patch > > > There are a lot of common file names used in HDFS, mainly created by > mapreduce, such as file names starting with "part". Reusing byte[] > corresponding to these recurring file names will save significant heap space > used for storing the file names in millions of INodeFile objects. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1123) Need HDFS Protocol Specification
Need HDFS Protocol Specification Key: HDFS-1123 URL: https://issues.apache.org/jira/browse/HDFS-1123 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: bc Wong It'd be great to document (in a spec, not in the code) the HDFS wire protocol: * The layout of the different request and reply messages. * The semantics of the various calls. * The semantics of the various fields. For example, I stumbled upon the goldmine of comments around DataNode.java:1150. It looks correct, but the version number of 9 doesn't inspire confidence that it's up-to-date. (It's also a random place to put such an important comment.) Having a formal spec is a big step forward for compatibility. It also highlights design decisions and helps with protocol evolution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1107) Turn on append by default.
[ https://issues.apache.org/jira/browse/HDFS-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1107: -- Attachment: appendOn.patch This patch turns appendon by default. But there is still a way to turn it off. The next radical step is to remove all checks in the code whether append is supported. I'll file another jira for that. It can be done in later in 0.22. > Turn on append by default. > -- > > Key: HDFS-1107 > URL: https://issues.apache.org/jira/browse/HDFS-1107 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Affects Versions: 0.21.0, 0.22.0 >Reporter: Konstantin Shvachko >Priority: Blocker > Fix For: 0.21.0 > > Attachments: appendOn.patch > > > hdfs-default.xml still has the old default value {{dfs.support.append = > false}}. It should be changed to {{true}}, or removed from the default > configuration and treated as {{true}} if not found. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1107) Turn on append by default.
[ https://issues.apache.org/jira/browse/HDFS-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1107: -- Status: Patch Available (was: Open) Assignee: Konstantin Shvachko > Turn on append by default. > -- > > Key: HDFS-1107 > URL: https://issues.apache.org/jira/browse/HDFS-1107 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Affects Versions: 0.21.0, 0.22.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Blocker > Fix For: 0.21.0 > > Attachments: appendOn.patch > > > hdfs-default.xml still has the old default value {{dfs.support.append = > false}}. It should be changed to {{true}}, or removed from the default > configuration and treated as {{true}} if not found. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1118) DFSOutputStream socket leak when cannot connect to DataNode
[ https://issues.apache.org/jira/browse/HDFS-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-1118: - Status: Patch Available (was: Open) > DFSOutputStream socket leak when cannot connect to DataNode > --- > > Key: HDFS-1118 > URL: https://issues.apache.org/jira/browse/HDFS-1118 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.2, 0.20.1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-1118.1.patch, HDFS-1118.2.patch > > > The offending code is in {{DFSOutputStream.nextBlockOutputStream}} > This function retries several times to call {{createBlockOutputStream}}. Each > time when it fails, it leaves a {{Socket}} object in {{DFSOutputStream.s}}. > That object is never closed, but overwritten the next time > {{createBlockOutputStream}} is called. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1118) DFSOutputStream socket leak when cannot connect to DataNode
[ https://issues.apache.org/jira/browse/HDFS-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-1118: - Attachment: HDFS-1118.2.patch Moved the cleanup to finally section. > DFSOutputStream socket leak when cannot connect to DataNode > --- > > Key: HDFS-1118 > URL: https://issues.apache.org/jira/browse/HDFS-1118 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-1118.1.patch, HDFS-1118.2.patch > > > The offending code is in {{DFSOutputStream.nextBlockOutputStream}} > This function retries several times to call {{createBlockOutputStream}}. Each > time when it fails, it leaves a {{Socket}} object in {{DFSOutputStream.s}}. > That object is never closed, but overwritten the next time > {{createBlockOutputStream}} is called. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1118) DFSOutputStream socket leak when cannot connect to DataNode
[ https://issues.apache.org/jira/browse/HDFS-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-1118: - Status: Open (was: Patch Available) > DFSOutputStream socket leak when cannot connect to DataNode > --- > > Key: HDFS-1118 > URL: https://issues.apache.org/jira/browse/HDFS-1118 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.2, 0.20.1 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: HDFS-1118.1.patch, HDFS-1118.2.patch > > > The offending code is in {{DFSOutputStream.nextBlockOutputStream}} > This function retries several times to call {{createBlockOutputStream}}. Each > time when it fails, it leaves a {{Socket}} object in {{DFSOutputStream.s}}. > That object is never closed, but overwritten the next time > {{createBlockOutputStream}} is called. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1105) Balancer improvement
[ https://issues.apache.org/jira/browse/HDFS-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862470#action_12862470 ] Hairong Kuang commented on HDFS-1105: - Thank Dmytro for uploading a new patch. I really like the changes you made! Here are more review comments: # The major contribution of the patch is that it enforces the max time for each iteration including the waiting time for moves to complete. I prefer the structure of disptchBlockMove to be {code} { long startTime = Util.now(); start threads to schedule & dispatch block moves; pass startTime to each thread as you do in your patch; waitForMoveCompletion(startTime); // pass startTime as well; return when reaches the max iteration time }{code} In this way, you do not need to introduce new heuristic for waitForMoveCompletion to quit as you do in your patch. # I prefer PendingBlockMove#closeSocket() to call sock.close() instead of closing only its input stream. I understand that the final section of receiveResponse() closes the socket. However it is nice to release all its resources in one shot even in PendingBlockMove#closeSocket(). ReceiveRespnse() should catch EOFException before catching IOException to avoid printing two log messages for one exception. The log message for EOFException should simply say EOFException because sometimes it may not caused by PendingBlockMove#closeSocket(). Other minor comments: # should remove unused imports; # MAX_NUM_CONCURRENT_MOVE should not drop modifier "final"; # should keep all option parsing & balancer initialization in one method "init"; # Replace timeToStr with your new time format and calls timeToStr(timeLeft) in Balancer#run(); # It is not user friendly to print exception stack trace on the screen. > Balancer improvement > > > Key: HDFS-1105 > URL: https://issues.apache.org/jira/browse/HDFS-1105 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Dmytro Molkov >Assignee: Dmytro Molkov > Attachments: HDFS-1105.2.patch, HDFS-1105.3.patch, HDFS-1105.patch > > > We were seeing some weird issues with the balancer in our cluster: > 1) it can get stuck during an iteration and only restarting it helps > 2) the iterations are highly inefficient. With 20 minutes iteration it moves > 7K blocks a minute for the first 6 minutes and hundreds of blocks in the next > 14 minutes > 3) it can hit namenode and the network pretty hard > A few improvements we came up with as a result: > Making balancer more deterministic in terms of running time of iteration, > improving the efficiency and making the load configurable: > Make many of the constants configurable command line parameters: Iteration > length, number of blocks to move in parallel to a given node and in cluster > overall. > Terminate transfers that are still in progress after iteration is over. > Previously iteration time was the time window in which the balancer was > scheduling the moves and then it would wait for the moves to finish > indefinitely. Each scheduling task can run up to iteration time or even > longer. This means if you have too many of them and they are long your actual > iterations are longer than 20 minutes. Now each scheduling task has a time of > the start of iteration and it should schedule the moves only if it did not > run out of time. So the tasks that have started after the iteration is over > will not schedule any moves. > The number of move threads and dispatch threads is configurable so that > depending on the load of the cluster you can run it slower. > I will attach a patch, please let me know what you think and what can be done > better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1107) Turn on append by default.
[ https://issues.apache.org/jira/browse/HDFS-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862466#action_12862466 ] Eli Collins commented on HDFS-1107: --- +1 > Turn on append by default. > -- > > Key: HDFS-1107 > URL: https://issues.apache.org/jira/browse/HDFS-1107 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Affects Versions: 0.21.0, 0.22.0 >Reporter: Konstantin Shvachko >Priority: Blocker > Fix For: 0.21.0 > > > hdfs-default.xml still has the old default value {{dfs.support.append = > false}}. It should be changed to {{true}}, or removed from the default > configuration and treated as {{true}} if not found. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1107) Turn on append by default.
[ https://issues.apache.org/jira/browse/HDFS-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862465#action_12862465 ] Jakob Homan commented on HDFS-1107: --- +1 > Turn on append by default. > -- > > Key: HDFS-1107 > URL: https://issues.apache.org/jira/browse/HDFS-1107 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Affects Versions: 0.21.0, 0.22.0 >Reporter: Konstantin Shvachko >Priority: Blocker > Fix For: 0.21.0 > > > hdfs-default.xml still has the old default value {{dfs.support.append = > false}}. It should be changed to {{true}}, or removed from the default > configuration and treated as {{true}} if not found. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1107) Turn on append by default.
[ https://issues.apache.org/jira/browse/HDFS-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862458#action_12862458 ] Konstantin Shvachko commented on HDFS-1107: --- I am going to remove {{dfs.support.append}} from {{hdfs-default.xml}}, and change the default value to true for this variable in the code, if there are no other suggestions. This should not be treated as incompatible change, as I cannot imagine programs that would strictly rely on that append is not supported and would fail if it suddenly is. > Turn on append by default. > -- > > Key: HDFS-1107 > URL: https://issues.apache.org/jira/browse/HDFS-1107 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Affects Versions: 0.21.0, 0.22.0 >Reporter: Konstantin Shvachko >Priority: Blocker > Fix For: 0.21.0 > > > hdfs-default.xml still has the old default value {{dfs.support.append = > false}}. It should be changed to {{true}}, or removed from the default > configuration and treated as {{true}} if not found. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-995) Replace usage of FileStatus#isDir()
[ https://issues.apache.org/jira/browse/HDFS-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-995: - Issue Type: Bug (was: Improvement) Fix Version/s: 0.21.0 Affects Version/s: 0.20.3 0.21.0 (was: 0.22.0) Priority: Blocker (was: Major) > Replace usage of FileStatus#isDir() > --- > > Key: HDFS-995 > URL: https://issues.apache.org/jira/browse/HDFS-995 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.3, 0.21.0 >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.21.0, 0.22.0 > > Attachments: hdfs-995-1.patch > > > HADOOP-6585 is going to deprecate FileStatus#isDir(). This jira is for > replacing all uses of isDir() in HDFS with checks of isDirectory(), isFile(), > or isSymlink() as needed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1107) Turn on append by default.
[ https://issues.apache.org/jira/browse/HDFS-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1107: -- Priority: Blocker (was: Major) I think it should be fixed for 0.21. > Turn on append by default. > -- > > Key: HDFS-1107 > URL: https://issues.apache.org/jira/browse/HDFS-1107 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Affects Versions: 0.21.0, 0.22.0 >Reporter: Konstantin Shvachko >Priority: Blocker > Fix For: 0.21.0 > > > hdfs-default.xml still has the old default value {{dfs.support.append = > false}}. It should be changed to {{true}}, or removed from the default > configuration and treated as {{true}} if not found. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-829) hdfsJniHelper.c: #include is not portable
[ https://issues.apache.org/jira/browse/HDFS-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-829: - Status: Patch Available (was: Open) Affects Version/s: 0.21.0 0.22.0 Fix Version/s: 0.21.0 0.22.0 +1 Looks good to me for 21. > hdfsJniHelper.c: #include is not portable > --- > > Key: HDFS-829 > URL: https://issues.apache.org/jira/browse/HDFS-829 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0, 0.22.0 >Reporter: Allen Wittenauer > Fix For: 0.21.0, 0.22.0 > > Attachments: HDFS-632.patch, hdfs-829.patch > > > hdfsJniHelper.c includes but this appears to be unnecessary, since > even under Linux none of the routines that are prototyped are used. Worse > yet, error.h doesn't appear to be a standard header file so this breaks on > Mac OS X and Solaris and prevents libhdfs from being built. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-829) hdfsJniHelper.c: #include is not portable
[ https://issues.apache.org/jira/browse/HDFS-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-829: - Status: Open (was: Patch Available) > hdfsJniHelper.c: #include is not portable > --- > > Key: HDFS-829 > URL: https://issues.apache.org/jira/browse/HDFS-829 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Allen Wittenauer > Attachments: HDFS-632.patch, hdfs-829.patch > > > hdfsJniHelper.c includes but this appears to be unnecessary, since > even under Linux none of the routines that are prototyped are used. Worse > yet, error.h doesn't appear to be a standard header file so this breaks on > Mac OS X and Solaris and prevents libhdfs from being built. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-808) Implement something like PAR2 support?
[ https://issues.apache.org/jira/browse/HDFS-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-808. --- Resolution: Duplicate > Implement something like PAR2 support? > -- > > Key: HDFS-808 > URL: https://issues.apache.org/jira/browse/HDFS-808 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Allen Wittenauer >Priority: Minor > > We really need an Idea issue type, because I'm not sure if this is really > viable. :) Just sort of thinking "out loud". > I was thinking about how file recovery works on services like Usenet to fix > data corruption when chunks of files are missing. I wonder how hard it would > be to implement something like PAR2 [ http://en.wikipedia.org/wiki/Parchive ] > automatically for large files. We'd have the advantage of being able to do > it in binary of course and could likely hide the details within HDFS itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862444#action_12862444 ] Eli Collins commented on HDFS-941: -- bq. hadoop fs -put of a 1g file from n clients in parallel. I suspect this will improve, socket resuse should limit slow start but good to check. Meant fs -get here since we're caching sockets on reads and not writes. I think the DFSInputStream currently creates a new socket for each block it fetches. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-829) hdfsJniHelper.c: #include is not portable
[ https://issues.apache.org/jira/browse/HDFS-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862443#action_12862443 ] Allen Wittenauer commented on HDFS-829: --- It looks like libhdfs is in the hdfs tree in trunk. So this can get committed now, right? Can we get this in prior to the 0.21 cut over? > hdfsJniHelper.c: #include is not portable > --- > > Key: HDFS-829 > URL: https://issues.apache.org/jira/browse/HDFS-829 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Allen Wittenauer > Attachments: HDFS-632.patch, hdfs-829.patch > > > hdfsJniHelper.c includes but this appears to be unnecessary, since > even under Linux none of the routines that are prototyped are used. Worse > yet, error.h doesn't appear to be a standard header file so this breaks on > Mac OS X and Solaris and prevents libhdfs from being built. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862439#action_12862439 ] Eli Collins commented on HDFS-941: -- Hey bc, Nice change! Do you have any results from a non-random workload? Please collect: # before/after TestDFSIO runs so we can see if sequential throughput is affected # hadoop fs -put of a 1g file from n clients in parallel. I suspect this will improve, socket resuse should limit slow start but good to check. How did you choose DEFAULT_CACHE_SIZE? In the exception handler in sendReadResult can we be more specific about when it's OK not to be able to send the result, and throw an exception in the cases when it's no OK rather than swallowing all IOExceptions? In DataXceiver#opReadBlock you throw an IOException in a try block that catches IOException. I think that should LOG.error and close the output stream. You can also chain the following if statements that check stat. How about asserting sock != null in putCachedSocket? Seems like this should never happen if the code is correct and it's easy to ignore logs. File a jira for ERROR_CHECKSUM? Please add a comment to the head of ReaderSocketCache explaining why we cache BlockReader socket pairs, as opposed to just caching sockets (because we don't multiplex BlockReaders over a single socket between hosts). Nits: * Nice comment in the BlockReader header, please define "packet" as well. Is the RPC specification in DataNode outdated? If so fix it or file a jira instead of warning readers it may be outdated. * Maybe better name for DN_KEEPALIVE_TIMEOUT since there is no explicit keepalive? TRANSFER_TIMEOUT? * Would rename workDone to something specific like opsProcessed or make it a boolean * Add an "a" in "with checksum" * if needs braces eg BlockReader#read Thanks, Eli > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-760) "fs -put" fails if dfs.umask is set to 63
[ https://issues.apache.org/jira/browse/HDFS-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan resolved HDFS-760. -- Resolution: Fixed This was fixed by HADOOP-6521. > "fs -put" fails if dfs.umask is set to 63 > - > > Key: HDFS-760 > URL: https://issues.apache.org/jira/browse/HDFS-760 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > > Add the following to hdfs-site.conf > {noformat} > > dfs.umask > 63 > > {noformat} > Then run "hadoop fs -put" > {noformat} > -bash-3.1$ ./bin/hadoop fs -put README.txt r.txt > 09/11/09 23:09:07 WARN conf.Configuration: mapred.task.id is deprecated. > Instead, use mapreduce.task.attempt.id > put: 63 > Usage: java FsShell [-put ... ] > -bash-3.1$ > {noformat} > Observed the above behavior in 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1110) Namenode heap optimization - reuse objects for commonly used file names
[ https://issues.apache.org/jira/browse/HDFS-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862431#action_12862431 ] Konstantin Shvachko commented on HDFS-1110: --- bq. File names used > 10 times 24 What are the names of these 24 files? Do they fall under the proposed default pattern. How big is the noise if we use the default pattern. On the one hand I see the point of providing a generic approach for people to specify their own patterns. But I also agree with Dhruba that we need to optimize only for the top ten (or so) file names, which will give us 5% saving in the meta-data memory footprint. The rest should be ignored, it would be a wast of resources to optimize for the rest. Your approach 2 would be a move in this direction. So may be it would be useful to have a tool Jacob mentions (OIV-based), so that admins could run it offline on the image and get top N frequently used names, with an estimate how much space this saves. Then they will be able to formulate the reg exp. Otherwise, it is going to be a painful guessing game. > Namenode heap optimization - reuse objects for commonly used file names > --- > > Key: HDFS-1110 > URL: https://issues.apache.org/jira/browse/HDFS-1110 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: hdfs-1110.patch > > > There are a lot of common file names used in HDFS, mainly created by > mapreduce, such as file names starting with "part". Reusing byte[] > corresponding to these recurring file names will save significant heap space > used for storing the file names in millions of INodeFile objects. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-760) "fs -put" fails if dfs.umask is set to 63
[ https://issues.apache.org/jira/browse/HDFS-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HDFS-760: --- Priority: Major (was: Blocker) Downgrading from blocker for 0.21. Looks like this is a corner case which has a workaround (use octal). > "fs -put" fails if dfs.umask is set to 63 > - > > Key: HDFS-760 > URL: https://issues.apache.org/jira/browse/HDFS-760 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Tsz Wo (Nicholas), SZE > Fix For: 0.21.0, 0.22.0 > > > Add the following to hdfs-site.conf > {noformat} > > dfs.umask > 63 > > {noformat} > Then run "hadoop fs -put" > {noformat} > -bash-3.1$ ./bin/hadoop fs -put README.txt r.txt > 09/11/09 23:09:07 WARN conf.Configuration: mapred.task.id is deprecated. > Instead, use mapreduce.task.attempt.id > put: 63 > Usage: java FsShell [-put ... ] > -bash-3.1$ > {noformat} > Observed the above behavior in 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-708) A stress-test tool for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862412#action_12862412 ] Konstantin Shvachko commented on HDFS-708: -- With respect to 14. I found the following solution. {code} public DataGenerator(FileSystem fs, Path fn) throws IOException { if(!(fs instanceof DistributedFileSystem)) { this.fileId = -1L; return; } DFSDataInputStream in = null; try { in = (DFSDataInputStream) ((DistributedFileSystem)fs).open(fn); this.fileId = in.getCurrentBlock().getBlockId(); } finally { if(in != null) in.close(); } } {code} Right after creating a file for write you can get the id of the first block of the file and store it in {{DataGenerator.fileId}} - a new field.. This id is not changing while renames, and can be reliably used as a file-specific mix-in for hash in data generation and verification. The data value of a file at a specific offset is then calculated as {{hash(fileId, offset)}}; > A stress-test tool for HDFS. > > > Key: HDFS-708 > URL: https://issues.apache.org/jira/browse/HDFS-708 > Project: Hadoop HDFS > Issue Type: New Feature > Components: test, tools >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko >Assignee: Joshua Harlow > Fix For: 0.22.0 > > Attachments: slive.patch, SLiveTest.pdf > > > It would be good to have a tool for automatic stress testing HDFS, which > would provide IO-intensive load on HDFS cluster. > The idea is to start the tool, let it run overnight, and then be able to > analyze possible failures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-776) Fix exception handling in Balancer
[ https://issues.apache.org/jira/browse/HDFS-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HDFS-776: --- Priority: Critical (was: Blocker) > Fix exception handling in Balancer > -- > > Key: HDFS-776 > URL: https://issues.apache.org/jira/browse/HDFS-776 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Owen O'Malley >Priority: Critical > Fix For: 0.21.0 > > > The Balancer's AccessKeyUpdater handles exceptions badly. In particular: > 1. Catching Exception too low. The wrapper around setKeys should only catch > IOException. > 2. InterruptedException is ignored. It should be caught at the top level and > exit run. > 3. Throwable is not caught. It should be caught at the top level and kill the > Balancer server process. > {code} > class AccessKeyUpdater implements Runnable { > public void run() { > while (shouldRun) { > try { > accessTokenHandler.setKeys(namenode.getAccessKeys()); > } catch (Exception e) { > LOG.error(StringUtils.stringifyException(e)); > } > try { > Thread.sleep(keyUpdaterInterval); > } catch (InterruptedException ie) { > } > } > } > } > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-708) A stress-test tool for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua Harlow updated HDFS-708: --- Attachment: (was: slive.patch) > A stress-test tool for HDFS. > > > Key: HDFS-708 > URL: https://issues.apache.org/jira/browse/HDFS-708 > Project: Hadoop HDFS > Issue Type: New Feature > Components: test, tools >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko >Assignee: Joshua Harlow > Fix For: 0.22.0 > > Attachments: slive.patch, SLiveTest.pdf > > > It would be good to have a tool for automatic stress testing HDFS, which > would provide IO-intensive load on HDFS cluster. > The idea is to start the tool, let it run overnight, and then be able to > analyze possible failures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-708) A stress-test tool for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua Harlow updated HDFS-708: --- Attachment: slive.patch Updated for code comments. > A stress-test tool for HDFS. > > > Key: HDFS-708 > URL: https://issues.apache.org/jira/browse/HDFS-708 > Project: Hadoop HDFS > Issue Type: New Feature > Components: test, tools >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko >Assignee: Joshua Harlow > Fix For: 0.22.0 > > Attachments: slive.patch, slive.patch, SLiveTest.pdf > > > It would be good to have a tool for automatic stress testing HDFS, which > would provide IO-intensive load on HDFS cluster. > The idea is to start the tool, let it run overnight, and then be able to > analyze possible failures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-708) A stress-test tool for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862396#action_12862396 ] Joshua Harlow commented on HDFS-708: 1. Done 2. Done 3. These methods have meanings for null (mainly for default checks for existence for merging) and a random seed meaning null means no seed which is possible. Duration for milliseconds can return an int though. Just that null has a meaning if the default value for a config option is set to be a null object. Which it is in a couple of cases. 4 & 5. Done (we are no measuring only the time around readByte and write()) 6. Done 7. Done 8. Done 9 & 10. Done 11. Done and most classes made package private 15. Will add some tests. > A stress-test tool for HDFS. > > > Key: HDFS-708 > URL: https://issues.apache.org/jira/browse/HDFS-708 > Project: Hadoop HDFS > Issue Type: New Feature > Components: test, tools >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko >Assignee: Joshua Harlow > Fix For: 0.22.0 > > Attachments: slive.patch, SLiveTest.pdf > > > It would be good to have a tool for automatic stress testing HDFS, which > would provide IO-intensive load on HDFS cluster. > The idea is to start the tool, let it run overnight, and then be able to > analyze possible failures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1079) HDFS implementation should throw exceptions defined in AbstractFileSystem
[ https://issues.apache.org/jira/browse/HDFS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862395#action_12862395 ] Suresh Srinivas commented on HDFS-1079: --- Eli, given we need a test for checking right exceptions are thrown from FileContext for various file systems, I have created Hadoop-6736 to capture that effort. The test is fairly involved and is better off done in a separate jira. Yes, we should throw HadoopIllegalArgumentException where currently IllegalArgumentExceptions are thrown. > HDFS implementation should throw exceptions defined in AbstractFileSystem > - > > Key: HDFS-1079 > URL: https://issues.apache.org/jira/browse/HDFS-1079 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: HDFS-1079.1.patch, HDFS-1079.patch, HDFS-1079.patch > > > HDFS implementation Hdfs.java should throw exceptions as defined in > AbstractFileSystem. To facilitate this, ClientProtocol should be changed to > throw specific exceptions, as defined in AbstractFileSystem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-801) Add SureLogic annotations' jar into Ivy and Eclipse configs
[ https://issues.apache.org/jira/browse/HDFS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862394#action_12862394 ] Hadoop QA commented on HDFS-801: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428002/hdfs_3.1.0.patch against trunk revision 939091. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/334/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/334/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/334/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/334/console This message is automatically generated. > Add SureLogic annotations' jar into Ivy and Eclipse configs > --- > > Key: HDFS-801 > URL: https://issues.apache.org/jira/browse/HDFS-801 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build, tools >Affects Versions: 0.22.0 >Reporter: Konstantin Boudnik >Assignee: Edwin Chan > Attachments: hdfs_3.1.0.patch, hdfs_3.1.0.patch > > > In order to use SureLogic analysis tools and allow their concurrency analysis > annotations in HDFS code the annotations library has to be automatically > pulled from a Maven repo. Also, it has to be added to Eclipse .classpath > template. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1104) Fsck triggers full GC on NameNode
[ https://issues.apache.org/jira/browse/HDFS-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1104: Status: Patch Available (was: Open) > Fsck triggers full GC on NameNode > - > > Key: HDFS-1104 > URL: https://issues.apache.org/jira/browse/HDFS-1104 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: fsckATime.patch, fsckATime1.patch, fsckATime2.patch > > > A NameNode at one of our clusters fell into full GC while fsck was performed. > Digging into the problem shows that it is caused by how NameNode handles the > access time of a file. > Fsck calls open on every file in the checked directory to get the file's > block locations. Each open changes the file's access time and then leads to > writing a transaction entry to the edit log. The current code optimizes open > so that it returns without issuing synchronizing the edit log to the disk. It > happened that in our cluster no other jobs were running while fsck was > performed. No edit log sync was ever called. So all open transactions were > kept in memory. When the edit log buffer got full, it automatically doubled > its space by allocating a new buffer. Full GC happened when no contiguous > space were found when allocating a new bigger buffer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1104) Fsck triggers full GC on NameNode
[ https://issues.apache.org/jira/browse/HDFS-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1104: Status: Open (was: Patch Available) > Fsck triggers full GC on NameNode > - > > Key: HDFS-1104 > URL: https://issues.apache.org/jira/browse/HDFS-1104 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: fsckATime.patch, fsckATime1.patch, fsckATime2.patch > > > A NameNode at one of our clusters fell into full GC while fsck was performed. > Digging into the problem shows that it is caused by how NameNode handles the > access time of a file. > Fsck calls open on every file in the checked directory to get the file's > block locations. Each open changes the file's access time and then leads to > writing a transaction entry to the edit log. The current code optimizes open > so that it returns without issuing synchronizing the edit log to the disk. It > happened that in our cluster no other jobs were running while fsck was > performed. No edit log sync was ever called. So all open transactions were > kept in memory. When the edit log buffer got full, it automatically doubled > its space by allocating a new buffer. Full GC happened when no contiguous > space were found when allocating a new bigger buffer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1089) Remove uses of FileContext#isFile, isDirectory and exists
[ https://issues.apache.org/jira/browse/HDFS-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-1089: -- Status: Patch Available (was: Open) Kick hudson. HADOOP-6678 should have made it into the common jar by now. > Remove uses of FileContext#isFile, isDirectory and exists > - > > Key: HDFS-1089 > URL: https://issues.apache.org/jira/browse/HDFS-1089 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 0.21.0 > > Attachments: hdfs-1089-1.patch > > > Here's an HDFS jira for the second part of HADOOP-6678: removing uses of > FileContext#isFile, isDirectory and exists. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1089) Remove uses of FileContext#isFile, isDirectory and exists
[ https://issues.apache.org/jira/browse/HDFS-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-1089: -- Status: Open (was: Patch Available) > Remove uses of FileContext#isFile, isDirectory and exists > - > > Key: HDFS-1089 > URL: https://issues.apache.org/jira/browse/HDFS-1089 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 0.21.0 > > Attachments: hdfs-1089-1.patch > > > Here's an HDFS jira for the second part of HADOOP-6678: removing uses of > FileContext#isFile, isDirectory and exists. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1007) HFTP needs to be updated to use delegation tokens
[ https://issues.apache.org/jira/browse/HDFS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HDFS-1007: -- Attachment: 1007-bugfix.patch Bugfix for handling null tokens (for Y20S) > HFTP needs to be updated to use delegation tokens > - > > Key: HDFS-1007 > URL: https://issues.apache.org/jira/browse/HDFS-1007 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 0.22.0 >Reporter: Devaraj Das >Assignee: Devaraj Das > Fix For: 0.22.0 > > Attachments: 1007-bugfix.patch, distcp-hftp-2.1.1.patch, > distcp-hftp.1.patch, distcp-hftp.2.1.patch, distcp-hftp.2.patch, > distcp-hftp.patch, HDFS-1007-BP20-fix-1.patch, HDFS-1007-BP20-fix-2.patch, > HDFS-1007-BP20-fix-3.patch, HDFS-1007-BP20.patch > > > HFTPFileSystem should be updated to use the delegation tokens so that it can > talk to the secure namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1001) DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK
[ https://issues.apache.org/jira/browse/HDFS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-1001: -- Status: Patch Available (was: Open) > DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK > - > > Key: HDFS-1001 > URL: https://issues.apache.org/jira/browse/HDFS-1001 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: bc Wong >Assignee: bc Wong > Attachments: HDFS-1001-2.patch, HDFS-1001-3.patch, HDFS-1001-3.patch, > HDFS-1001-rebased.patch, HDFS-1001.patch, HDFS-1001.patch.1 > > > Running the TestPread with additional debug statements reveals that the > BlockReader sends CHECKSUM_OK when the DataXceiver doesn't expect it. > Currently it doesn't matter since DataXceiver closes the connection after > each op, and CHECKSUM_OK is the last thing on the wire. But if we want to > cache connections, they need to agree on the exchange of CHECKSUM_OK. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1001) DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK
[ https://issues.apache.org/jira/browse/HDFS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-1001: -- Status: Open (was: Patch Available) > DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK > - > > Key: HDFS-1001 > URL: https://issues.apache.org/jira/browse/HDFS-1001 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: bc Wong >Assignee: bc Wong > Attachments: HDFS-1001-2.patch, HDFS-1001-3.patch, HDFS-1001-3.patch, > HDFS-1001-rebased.patch, HDFS-1001.patch, HDFS-1001.patch.1 > > > Running the TestPread with additional debug statements reveals that the > BlockReader sends CHECKSUM_OK when the DataXceiver doesn't expect it. > Currently it doesn't matter since DataXceiver closes the connection after > each op, and CHECKSUM_OK is the last thing on the wire. But if we want to > cache connections, they need to agree on the exchange of CHECKSUM_OK. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1001) DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK
[ https://issues.apache.org/jira/browse/HDFS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862370#action_12862370 ] Eli Collins commented on HDFS-1001: --- +1 Nice change. Nits: * in DataNode "DataNode always expects" should read "always checks" since the response is optional. * Would rename readCasually to something like readBytesCheckEOS * In DataXceiver "from client" should read "from the client" > DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK > - > > Key: HDFS-1001 > URL: https://issues.apache.org/jira/browse/HDFS-1001 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: bc Wong >Assignee: bc Wong > Attachments: HDFS-1001-2.patch, HDFS-1001-3.patch, HDFS-1001-3.patch, > HDFS-1001-rebased.patch, HDFS-1001.patch, HDFS-1001.patch.1 > > > Running the TestPread with additional debug statements reveals that the > BlockReader sends CHECKSUM_OK when the DataXceiver doesn't expect it. > Currently it doesn't matter since DataXceiver closes the connection after > each op, and CHECKSUM_OK is the last thing on the wire. But if we want to > cache connections, they need to agree on the exchange of CHECKSUM_OK. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: conurrent-reader-patch-1.txt based on hadoop root dir > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Attachments: conurrent-reader-patch-1.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: (was: conurrent-reader-patch-1.txt) > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Attachments: conurrent-reader-patch-1.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1104) Fsck triggers full GC on NameNode
[ https://issues.apache.org/jira/browse/HDFS-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1104: Attachment: fsckATime2.patch This patch addressed Nicholas's review comments: > The unit test may not work since there is a FSNamesystem.accessTimePrecision. Changed the default precision in the test. > NameNode.getBlockLocationsNoATime(..) does not check permission. Woops, it was in the first patch but was accidentally removed from the 2nd patch. > Fsck triggers full GC on NameNode > - > > Key: HDFS-1104 > URL: https://issues.apache.org/jira/browse/HDFS-1104 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: fsckATime.patch, fsckATime1.patch, fsckATime2.patch > > > A NameNode at one of our clusters fell into full GC while fsck was performed. > Digging into the problem shows that it is caused by how NameNode handles the > access time of a file. > Fsck calls open on every file in the checked directory to get the file's > block locations. Each open changes the file's access time and then leads to > writing a transaction entry to the edit log. The current code optimizes open > so that it returns without issuing synchronizing the edit log to the disk. It > happened that in our cluster no other jobs were running while fsck was > performed. No edit log sync was ever called. So all open transactions were > kept in memory. When the edit log buffer got full, it automatically doubled > its space by allocating a new buffer. Full GC happened when no contiguous > space were found when allocating a new bigger buffer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862359#action_12862359 ] sam rash commented on HDFS-1057: oops, thought i had passed the right options to git diff. will update in a bit in the meantime, patch -p3 < patch.txt will work > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Attachments: conurrent-reader-patch-1.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862353#action_12862353 ] Jean-Daniel Cryans commented on HDFS-1057: -- Sam, can you base the patch on hadoop's root folder? It's kinda hard to apply as is. > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Attachments: conurrent-reader-patch-1.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: conurrent-reader-patch-1.txt 0.20 test + patch > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Attachments: conurrent-reader-patch-1.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1122) client block verification may result in blocks in DataBlockScanner prematurely
[ https://issues.apache.org/jira/browse/HDFS-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862348#action_12862348 ] sam rash commented on HDFS-1122: This results in these log messages: 2010-04-21 13:06:30,951 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Adding an already existing block blk_6423942125821562308_117574 2010-04-21 12:59:47,054 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Adding an already existing block blk_-1890060265487773738_117566 2010-04-21 12:56:26,831 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Adding an already existing block blk_-8254097362836825914_117561 2010-04-21 12:53:03,386 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Adding an already existing block blk_8946894423251690136_117557 2010-04-21 12:49:43,148 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Adding an already existing block blk_-5467425469535997066_117553 2010-04-21 12:46:22,613 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Adding an already existing block blk_-3020378094937646676_117549 and its possible the block scanner could mark the blocks corrupt since they are being written to. I have a test + 0.20 patch I will upload shortly (crux of patch is that client verifications can only update the DataBlockScanner, not add new blocks). > client block verification may result in blocks in DataBlockScanner prematurely > -- > > Key: HDFS-1122 > URL: https://issues.apache.org/jira/browse/HDFS-1122 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: sam rash >Assignee: sam rash > > found that when the DN uses client verification of a block that is open for > writing, it will add it to the DataBlockScanner prematurely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1122) client block verification may result in blocks in DataBlockScanner prematurely
client block verification may result in blocks in DataBlockScanner prematurely -- Key: HDFS-1122 URL: https://issues.apache.org/jira/browse/HDFS-1122 Project: Hadoop HDFS Issue Type: Sub-task Reporter: sam rash Assignee: sam rash found that when the DN uses client verification of a block that is open for writing, it will add it to the DataBlockScanner prematurely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bc Wong updated HDFS-941: - Attachment: HDFS-941-3.patch Ousp. The previous patch was in the reverse direction. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-165) NPE in datanode.handshake()
[ https://issues.apache.org/jira/browse/HDFS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862338#action_12862338 ] Jakob Homan commented on HDFS-165: -- bq. I will merge [this patch] into the lifecycle patch rather than split out (as I have done here) Steve, I read this to mean this patch is no longer necessary and can be closed as WontFix? Does this sound good to you? > NPE in datanode.handshake() > --- > > Key: HDFS-165 > URL: https://issues.apache.org/jira/browse/HDFS-165 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 0.22.0 > > Attachments: HDFS-165.patch > > > It appears possible to raise an NPE in DataNode.handshake() if the startup > protocol gets interrupted or fails in some manner -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1001) DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK
[ https://issues.apache.org/jira/browse/HDFS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bc Wong updated HDFS-1001: -- Attachment: HDFS-1001-3.patch Ousp. Previous patch was in reverse direction. > DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK > - > > Key: HDFS-1001 > URL: https://issues.apache.org/jira/browse/HDFS-1001 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: bc Wong >Assignee: bc Wong > Attachments: HDFS-1001-2.patch, HDFS-1001-3.patch, HDFS-1001-3.patch, > HDFS-1001-rebased.patch, HDFS-1001.patch, HDFS-1001.patch.1 > > > Running the TestPread with additional debug statements reveals that the > BlockReader sends CHECKSUM_OK when the DataXceiver doesn't expect it. > Currently it doesn't matter since DataXceiver closes the connection after > each op, and CHECKSUM_OK is the last thing on the wire. But if we want to > cache connections, they need to agree on the exchange of CHECKSUM_OK. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1079) HDFS implementation should throw exceptions defined in AbstractFileSystem
[ https://issues.apache.org/jira/browse/HDFS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862332#action_12862332 ] Eli Collins commented on HDFS-1079: --- * What tests covers all the new throws of InvalidPathException? * Should the rest of h.hdfs.* be converted to HadoopIllegalArgumentException as well? > HDFS implementation should throw exceptions defined in AbstractFileSystem > - > > Key: HDFS-1079 > URL: https://issues.apache.org/jira/browse/HDFS-1079 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: HDFS-1079.1.patch, HDFS-1079.patch, HDFS-1079.patch > > > HDFS implementation Hdfs.java should throw exceptions as defined in > AbstractFileSystem. To facilitate this, ClientProtocol should be changed to > throw specific exceptions, as defined in AbstractFileSystem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1079) HDFS implementation should throw exceptions defined in AbstractFileSystem
[ https://issues.apache.org/jira/browse/HDFS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1079: -- Attachment: HDFS-1079.1.patch New patch addresses the comments except the following: # FSDirectory - out of sync with trunk #* Declare more concrete exceptions beyond IOExceptions #** renameTo - not changing deprecated methods #** unprotectedRenameTo not changing deprecated methods # FSNamesystem #* Remove IO exception declaration - not thrown #** concat - checkPathAccess called from this throws IOException #** setTimes - method throws IOException > HDFS implementation should throw exceptions defined in AbstractFileSystem > - > > Key: HDFS-1079 > URL: https://issues.apache.org/jira/browse/HDFS-1079 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: HDFS-1079.1.patch, HDFS-1079.patch, HDFS-1079.patch > > > HDFS implementation Hdfs.java should throw exceptions as defined in > AbstractFileSystem. To facilitate this, ClientProtocol should be changed to > throw specific exceptions, as defined in AbstractFileSystem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-708) A stress-test tool for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-708: - Status: Open (was: Patch Available) Canceling patch pending review updates. > A stress-test tool for HDFS. > > > Key: HDFS-708 > URL: https://issues.apache.org/jira/browse/HDFS-708 > Project: Hadoop HDFS > Issue Type: New Feature > Components: test, tools >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko >Assignee: Joshua Harlow > Fix For: 0.22.0 > > Attachments: slive.patch, SLiveTest.pdf > > > It would be good to have a tool for automatic stress testing HDFS, which > would provide IO-intensive load on HDFS cluster. > The idea is to start the tool, let it run overnight, and then be able to > analyze possible failures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-708) A stress-test tool for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862287#action_12862287 ] Konstantin Shvachko commented on HDFS-708: -- Sorry, (4) was a bit vague. I meant that the start and end times should be taken right around the actual HDFS action. E.g. for write it would be {code} get_start_time; outputStream.write(); get_elapsed_time; {code} 15. Forgot to mention that SLive should have a test. It can be simple. It can call slive on local MR and local FS with some reasonable parameters, which trigger most of the code paths. An alternative is to start Mini clusters and run slive on them. The important thing is it should not take long time to run. > A stress-test tool for HDFS. > > > Key: HDFS-708 > URL: https://issues.apache.org/jira/browse/HDFS-708 > Project: Hadoop HDFS > Issue Type: New Feature > Components: test, tools >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko >Assignee: Joshua Harlow > Fix For: 0.22.0 > > Attachments: slive.patch, SLiveTest.pdf > > > It would be good to have a tool for automatic stress testing HDFS, which > would provide IO-intensive load on HDFS cluster. > The idea is to start the tool, let it run overnight, and then be able to > analyze possible failures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-801) Add SureLogic annotations' jar into Ivy and Eclipse configs
[ https://issues.apache.org/jira/browse/HDFS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-801: - Status: Patch Available (was: Open) Re-submitting to Hudson since it's been a while since the last run, before commit. > Add SureLogic annotations' jar into Ivy and Eclipse configs > --- > > Key: HDFS-801 > URL: https://issues.apache.org/jira/browse/HDFS-801 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build, tools >Affects Versions: 0.22.0 >Reporter: Konstantin Boudnik >Assignee: Edwin Chan > Attachments: hdfs_3.1.0.patch, hdfs_3.1.0.patch > > > In order to use SureLogic analysis tools and allow their concurrency analysis > annotations in HDFS code the annotations library has to be automatically > pulled from a Maven repo. Also, it has to be added to Eclipse .classpath > template. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-801) Add SureLogic annotations' jar into Ivy and Eclipse configs
[ https://issues.apache.org/jira/browse/HDFS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-801: - Status: Open (was: Patch Available) > Add SureLogic annotations' jar into Ivy and Eclipse configs > --- > > Key: HDFS-801 > URL: https://issues.apache.org/jira/browse/HDFS-801 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build, tools >Affects Versions: 0.22.0 >Reporter: Konstantin Boudnik >Assignee: Edwin Chan > Attachments: hdfs_3.1.0.patch, hdfs_3.1.0.patch > > > In order to use SureLogic analysis tools and allow their concurrency analysis > annotations in HDFS code the annotations library has to be automatically > pulled from a Maven repo. Also, it has to be added to Eclipse .classpath > template. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1079) HDFS implementation should throw exceptions defined in AbstractFileSystem
[ https://issues.apache.org/jira/browse/HDFS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862237#action_12862237 ] Sanjay Radia commented on HDFS-1079: * dfsclient: The methods should declare only IOException - the actual exceptions are declared in Client-protocol; this will make it easier to keep the exception declarations uptodate. * FSDirectory - out of sync with trunk ** Declare more concrete exceptions beyond IOExceptions *** renameTo - *** unprotectedRenameTo *** getPreferredBlockSize *** addSymlink * FSNamesystem ** Remove IO exception declaration - not thrown *** unprotectedConcat *** getBlocklocations* *** createLocatedBlock *** concat *** setTimes ** Throw more detailed exception *** startFile* *** appendFile i *** delete* * File Jira to cleanup IOException - a better exception can be thrown ** removeBlock, removeLastBlock ** getCurrentUser ** plus a few others. > HDFS implementation should throw exceptions defined in AbstractFileSystem > - > > Key: HDFS-1079 > URL: https://issues.apache.org/jira/browse/HDFS-1079 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: HDFS-1079.patch, HDFS-1079.patch > > > HDFS implementation Hdfs.java should throw exceptions as defined in > AbstractFileSystem. To facilitate this, ClientProtocol should be changed to > throw specific exceptions, as defined in AbstractFileSystem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1113) Allow users with write access to a directory to change ownership of its subdirectories/files
[ https://issues.apache.org/jira/browse/HDFS-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862197#action_12862197 ] Owen O'Malley commented on HDFS-1113: - Making the file owner in to a user settable string is a huge cost to security. Making the users scan the audit log from the beginning of time to find the creator of a file isn't a great answer. Isn't the motivation really that you want to control access to the file? It seems like ACL's really answer your request (and many additional ones). > Allow users with write access to a directory to change ownership of its > subdirectories/files > > > Key: HDFS-1113 > URL: https://issues.apache.org/jira/browse/HDFS-1113 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node > Environment: All >Reporter: Milind Bhandarkar >Assignee: Sanjay Radia > > owner and group of a file/directory, and namespace/diskspace quota for a > directory are mutable attributes. If I have writable access to a directory, > say /team/MyTeam, and if there are subdirectories underneath, such as > /team/MyTeam/TeamMember1, /team/MyTeam/TeamMember2, then I should be able to > chown, chgrp, setQuota, clrQuota on TeamMemeber{1|2} subdirectories. > Currently in HDFS (and in Posix), it requires me to be a superuser to perform > these operations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1051) Umbrella Jira for Scaling the HDFS Name Service
[ https://issues.apache.org/jira/browse/HDFS-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862148#action_12862148 ] Jeff Hammerbacher commented on HDFS-1051: - Another piece of research germane to this JIRA: "Haceph: Scalable Metadata Management for Hadoop using Ceph " from UCSC (http://www.soe.ucsc.edu/~carlosm/Papers/eestolan-nsdi10-abstract.pdf) > Umbrella Jira for Scaling the HDFS Name Service > --- > > Key: HDFS-1051 > URL: https://issues.apache.org/jira/browse/HDFS-1051 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 0.22.0 >Reporter: Sanjay Radia >Assignee: Sanjay Radia > > The HDFS Name service currently uses a single Namenode which limits its > scalability. This is a master jira to track sub-jiras to address this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1121) Allow HDFS client to measure distribution of blocks across devices for a specific DataNode
Allow HDFS client to measure distribution of blocks across devices for a specific DataNode -- Key: HDFS-1121 URL: https://issues.apache.org/jira/browse/HDFS-1121 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Reporter: Jeff Hammerbacher As discussed on the mailing list, it would be useful if the DfsClient could measure the distribution of blocks across devices for an individual DataNode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1120) Make DataNode's block-to-device placement policy pluggable
Make DataNode's block-to-device placement policy pluggable -- Key: HDFS-1120 URL: https://issues.apache.org/jira/browse/HDFS-1120 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jeff Hammerbacher As discussed on the mailing list, as the number of disk drives per server increases, it would be useful to allow the DataNode's policy for new block placement to grow in sophistication from the current round-robin strategy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1119) Refactor BlocksMap with GettableSet
[ https://issues.apache.org/jira/browse/HDFS-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1119: - Attachment: h1119_20100429.patch h1119_20100429.patch: - Added GettableSet interface. - Added GettableSetByHashMap, a GettableSet implementation using java.util.HashMap. - Used GettableSet in BlocksMap. - Also removed unused getLoadFactor() methods. > Refactor BlocksMap with GettableSet > --- > > Key: HDFS-1119 > URL: https://issues.apache.org/jira/browse/HDFS-1119 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE > Attachments: h1119_20100429.patch > > > The data structure required in BlocksMap is a GettableSet. See also [this > comment|https://issues.apache.org/jira/browse/HDFS-1114?focusedCommentId=12862118&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12862118]. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1119) Refactor BlocksMap with GettableSet
Refactor BlocksMap with GettableSet --- Key: HDFS-1119 URL: https://issues.apache.org/jira/browse/HDFS-1119 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Reporter: Tsz Wo (Nicholas), SZE The data structure required in BlocksMap is a GettableSet. See also [this comment|https://issues.apache.org/jira/browse/HDFS-1114?focusedCommentId=12862118&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12862118]. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862118#action_12862118 ] Tsz Wo (Nicholas), SZE commented on HDFS-1114: -- The data structure we need in BlocksMap is a GettableSet. {code} /** * A set which supports the get operation. * @param The type of the elements. */ public interface GettableSet extends Iterable { /** * @return the size of this set. */ int size(); /** * @return true if the given element equals to a stored element. * Otherwise, return false. */ boolean contains(Object element); /** * @return the stored element if there is any. Otherwise, return null. */ E get(Object element); /** * Add the given element to this set. * @return the previous stored element if there is any. * Otherwise, return null. */ E add(E element); /** * Remove the element from the set. * @return the stored element if there is any. Otherwise, return null. */ E remove(Object element); } {code} > Reducing NameNode memory usage by an alternate hash table > - > > Key: HDFS-1114 > URL: https://issues.apache.org/jira/browse/HDFS-1114 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > > NameNode uses a java.util.HashMap to store BlockInfo objects. When there are > many blocks in HDFS, this map uses a lot of memory in the NameNode. We may > optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.