[jira] Created: (HDFS-1454) Update the documentation to reflect true client caching strategy
Update the documentation to reflect true client caching strategy Key: HDFS-1454 URL: https://issues.apache.org/jira/browse/HDFS-1454 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs client Reporter: Jeff Hammerbacher As noted on the mailing list (http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/%3caanlkti=2csk+ay05btouo-uzv=o4w6ox2pq4nxgpd...@mail.gmail.com%3e), the Staging section of http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_design.html#Data+Organization is out of date. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed
[ https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920513#action_12920513 ] Doug Cutting commented on HDFS-1435: Hairong, I don' think that using Avro is critical here. Avro's primarily intended for user data. Using Avro here could simplify long-term maintenance but short-term might add a significant amount of work. So I would not file another Jira unless you intend to implement it soon. Thanks! Provide an option to store fsimage compressed - Key: HDFS-1435 URL: https://issues.apache.org/jira/browse/HDFS-1435 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: trunkImageCompress.patch Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network bandwidth when secondary NN uploads a new fsimage to primary NN. If we could store fsimage compressed, the problem could be greatly alleviated. I plan to provide a new configuration hdfs.image.compressed with a default value of false. If it is set to be true, fsimage is stored as compressed. The fsimage will have a new layout with a new field compressed in its header, indicating if the namespace is stored compressed or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1448: -- Attachment: Viewer hierarchy.pdf Code review: * General: ** All classes should be categorized with audience and stability ** No need for all the brackets in messages. Breaks with what passes for our style. ** Do we need to write to disk for tests? Just write to output stream * editsStored.xml ** Indent/format to make more human readable * TestOfflineImageViewer.java ** Convert getBuildDir and getCacheDir to fields, rather than re-evaluating the method each call ** Would it be better to split the single, large test into four smaller tests, with more descriptive names? ** The commented-out code should be removed. If it's useful for manual testing, it can be included in a static main in the test ** Style: runOev method calls don't follow code convention, they can be all on one line ** The methods runOevXmlToBinary/runOevBinaryToXml can be refactored to remove common code, which is most of it. ** There is no need for a separate printToScreen variable in those methods ** fileEqualIgnoreTrailingZeroes: Since largeFilename is just aliased to filename1, there is no need for filename1. Just use that name as the method parameter. ** loadFile(): I'm surprised we don't have a utility method in the test package to do this. It's a general operation and this method may be better located there. ** A larger problem is that this test doesn't use asserts to verify correctness, which will make working with it difficult. The exceptions should be converted to fully described JUnit asserts. * OfflineEditsViewerHelper.java ** Class needs Javadoc ** Is it necessary to copy the edits file? Instead, can we just leave it in place and test it there? A better option, though I don't believe supported by MiniDFSCluster, would be if we could just write the edits to a memory buffer and avoid the disk altogether. ** Commented out code: fc = FileContext.getFileContext(cluster.getURI(), config); * Tokenizer.java ** Tokenizer works specifically with EditsElements, may be good to give it more specific name. Same comment for Token. ** I'm torn on the individual Token* classes. I'd rather if there were a way of directly integrating them into edits, but that's a bridge too far for this patch. Scala case classes would be quite helpful here... ** Several referances to static method encode/decodeBase64 via instance variable *EditsLoaderCurrent.java ** The duplicated edits enums should be re-factored into shared class rather than duplicated. ** Style: case OP_CLOSE doesn't need to be surrounded by braces, as do several other cases. ** The more involved cases should be refactored into separate classes to aid readability. This may be reasonable for all the cases to be consistent. ** OP_UPDATE_MASTER_KEY this seems to be the only place we check for the possibility of working with an unsupported version. Is there a reason for this? ** The pattern: {noformat}v.visit(t.read(new Tokenizer.Token{Whatever}(EditsElement.LENGTH)));{noformat} is repeated quite a lot. Can this be refactored into a helper method to aid in readability? ** By doing a static import of the various Tokenizer classes (which can be made static) such as: {noformat}import static org.apache.hadoop.hdfs.tools.offlineEditsViewer.Tokenizer.TokenInt;{noformat} you can avoid the extra reference to Tokenizer in the visit calls. ** I'm not sure that the statistics functionality adds any value to this class. It may be better to create a separate statistics viewer that provides this information. ** Several unnecessary imports * EditsVisitor.java ** The DepthCounter duplicates the same class in the oiv. May as well create a common utility class and share it. ** Commented out code: {noformat} // abstract void visit(EditsElement element, String value) throws IOException; {noformat} ** Unnecessary import of DeprecatedUTF8 * EditsVisitorXml.java ** Consistent naming with oiv would be XmlEditsVisitor ** I believe this class is quite ripe for a shared generic implementation with oiv's Xml viewer. This is discussed more below. ** unnecessary import of Base64 class * OfflineEditsViewer.java ** Typo: This class implements and offline edits viewer, tool that (and - an) ** No need to mention note about OfflineImageViewer. ** The command line parsing and options shares quite a bit of code with the oiv and may be easy to merge. * EditsVisitorBinary.java ** The printToScreen option is ignored and doesn't make sense for this viewer. It may be fine to keep the option, but we should probably add documentation about it being ignored by some visitors ** No need for commented-out debugging code * Tokenizers.java ** Since the class is a factory perhaps TokenizerFactory is a better name? ** The file type determination can be simplified by checking for .endsWith(.xml) ** Typo:*
[jira] Resolved: (HDFS-1453) Need a command line option in RaidShell to fix blocks using raid
[ https://issues.apache.org/jira/browse/HDFS-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali resolved HDFS-1453. --- Resolution: Invalid RAID is a MR project, will reopen this under MR. Need a command line option in RaidShell to fix blocks using raid Key: HDFS-1453 URL: https://issues.apache.org/jira/browse/HDFS-1453 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/raid Reporter: Ramkumar Vadali RaidShell currently has an option to recover a file and return the path to the recovered file. The administrator can then rename the recovered file to the damaged file. The problem with this is that the file metadata is altered, specifically the modification time. Instead we need a way to just repair the damaged blocks and send the fixed blocks to a data node. Once this is done, we can put automation around it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1449) HDFS federation: Fix unit test failures
[ https://issues.apache.org/jira/browse/HDFS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920744#action_12920744 ] Jitendra Nath Pandey commented on HDFS-1449: The getBlockName method in ExtendedBlock just calls getBlockName on the encapsulated block object. Since blockNames may not be unique across the block pools, two different ExtendedBlock objects can return same blockName. Could this be an issue? HDFS federation: Fix unit test failures --- Key: HDFS-1449 URL: https://issues.apache.org/jira/browse/HDFS-1449 Project: Hadoop HDFS Issue Type: Bug Affects Versions: Federation Branch Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: Federation Branch Attachments: HDFS-1449.patch Unit test failures are failing due to ExtendedBlock#getBlockName() returning an invalid block file name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1455) Record DFS client/cli id with username/kerbros session token in audit log or hdfs client trace log
Record DFS client/cli id with username/kerbros session token in audit log or hdfs client trace log -- Key: HDFS-1455 URL: https://issues.apache.org/jira/browse/HDFS-1455 Project: Hadoop HDFS Issue Type: New Feature Reporter: Eric Yang HDFS usage calculation is commonly calculated by running dfs -dus and group directory usage by user at fix interval. This approach does not show accurate HDFS usage if a lot of read/write activity of equivalent amount of data happen at fix interval. In order to identify usage of such pattern, the usage calculation could be measured by the bytes read and bytes written in the hdfs client trace log. There is currently no association of DFSClient ID or CLI ID to the user or session token emitted by Hadoop hdfs client trace log files. This JIRA is to record DFS Client ID/CLI ID with user name/session token in appropriate place for more precious measuring of HDFS usage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1455) Record DFS client/cli id with username/kerbros session token in audit log or hdfs client trace log
[ https://issues.apache.org/jira/browse/HDFS-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920759#action_12920759 ] Jakob Homan commented on HDFS-1455: --- So the point of this would be to build an offline MR tool to get a better picture of individuals' hdfs usage? Including the info in the log would not make it available for real-time analysis. Record DFS client/cli id with username/kerbros session token in audit log or hdfs client trace log -- Key: HDFS-1455 URL: https://issues.apache.org/jira/browse/HDFS-1455 Project: Hadoop HDFS Issue Type: New Feature Reporter: Eric Yang HDFS usage calculation is commonly calculated by running dfs -dus and group directory usage by user at fix interval. This approach does not show accurate HDFS usage if a lot of read/write activity of equivalent amount of data happen at fix interval. In order to identify usage of such pattern, the usage calculation could be measured by the bytes read and bytes written in the hdfs client trace log. There is currently no association of DFSClient ID or CLI ID to the user or session token emitted by Hadoop hdfs client trace log files. This JIRA is to record DFS Client ID/CLI ID with user name/session token in appropriate place for more precious measuring of HDFS usage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed
[ https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920770#action_12920770 ] Hairong Kuang commented on HDFS-1435: - @Lu, compressing the fsimage has additional advantage of reducing disk I/O and as well as networkbandwidth when writing to a remote copy. I like your proposed optimizations like limiting transmission speed and not to download an fsimage if the one at primary NameNode is the same as the one at secondary NameNode. Could you please contribute those back to the community? @Doug, thanks for your feedback. Hope that we will get some time to work on the avro format soon. Provide an option to store fsimage compressed - Key: HDFS-1435 URL: https://issues.apache.org/jira/browse/HDFS-1435 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: trunkImageCompress.patch Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network bandwidth when secondary NN uploads a new fsimage to primary NN. If we could store fsimage compressed, the problem could be greatly alleviated. I plan to provide a new configuration hdfs.image.compressed with a default value of false. If it is set to be true, fsimage is stored as compressed. The fsimage will have a new layout with a new field compressed in its header, indicating if the namespace is stored compressed or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1455) Record DFS client/cli id with username/kerbros session token in audit log or hdfs client trace log
[ https://issues.apache.org/jira/browse/HDFS-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920774#action_12920774 ] Eric Yang commented on HDFS-1455: - Correct, this is for reporting purpose. It is possible to stream hdfs client trace through syslog protocol to chukwa to get near real-time analysis. Record DFS client/cli id with username/kerbros session token in audit log or hdfs client trace log -- Key: HDFS-1455 URL: https://issues.apache.org/jira/browse/HDFS-1455 Project: Hadoop HDFS Issue Type: New Feature Reporter: Eric Yang HDFS usage calculation is commonly calculated by running dfs -dus and group directory usage by user at fix interval. This approach does not show accurate HDFS usage if a lot of read/write activity of equivalent amount of data happen at fix interval. In order to identify usage of such pattern, the usage calculation could be measured by the bytes read and bytes written in the hdfs client trace log. There is currently no association of DFSClient ID or CLI ID to the user or session token emitted by Hadoop hdfs client trace log files. This JIRA is to record DFS Client ID/CLI ID with user name/session token in appropriate place for more precious measuring of HDFS usage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1435) Provide an option to store fsimage compressed
[ https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1435: Attachment: trunkImageCompress1.patch When I was doing performance testing, I found that TrunkImageCompress.patch has a bug that it does not use a buffered input stream to read an old image. This patch fixes this performance degradation. Provide an option to store fsimage compressed - Key: HDFS-1435 URL: https://issues.apache.org/jira/browse/HDFS-1435 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: trunkImageCompress.patch, trunkImageCompress1.patch Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network bandwidth when secondary NN uploads a new fsimage to primary NN. If we could store fsimage compressed, the problem could be greatly alleviated. I plan to provide a new configuration hdfs.image.compressed with a default value of false. If it is set to be true, fsimage is stored as compressed. The fsimage will have a new layout with a new field compressed in its header, indicating if the namespace is stored compressed or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920813#action_12920813 ] Konstantin Shvachko commented on HDFS-1448: --- I like Jacobs idea to isolate duplicate code common to I and E viewers. The hierarchy of classes looks good. I especially like the potential of reusing the same code for the actual image and edits loading and the viewers. Implementing plan #1 seems to be the shortest path to the final success. If we go with plan 1, it would be good to minimize public methods - easier for refactoring. Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Priority: Minor Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1448: -- Priority: Major (was: Minor) Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko reassigned HDFS-1448: - Assignee: Erik Steffl Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1456) Provide builder for constructing instances of MiniDFSCluster
Provide builder for constructing instances of MiniDFSCluster Key: HDFS-1456 URL: https://issues.apache.org/jira/browse/HDFS-1456 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Time to fix a broken window. Of the 293 occurences of new MiniDFSCluster(... most look something like: {noformat}cluster = new MiniDFSCluster(0, config, numDatanodes, true, false, true, null, null, null, null);{noformat} The largest constructor takes 10 parameters, and even the overloaded constructors can be difficult to read as their mainaly nulls or booleans. We should provide a Builder for constructing MiniDFSClusters to improve readability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1456) Provide builder for constructing instances of MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1456: -- Attachment: HDFS-1456.patch Patch creates a new Builder class for construction MiniDFSClusters. What before was: {noformat}cluster = new MiniDFSCluster(0, conf, NUM_DATA_NODES, true, false, true, null, null, null, null);{noformat} can now be expressed as {noformat}cluster = new MiniDFSCluster.Builder(conf) .numDataNodes(NUM_DATA_NODES) .manageNameDfsDirs(false).build();{noformat} I've converted a few instances to the new Builder. If people like this idea, I'll convert the rest, mainly through automation to avoid human error, but I wanted an easy-to-read patch before one filled with auto-refactoring. We can deprecate the MiniDFSConstructors as well. Provide builder for constructing instances of MiniDFSCluster Key: HDFS-1456 URL: https://issues.apache.org/jira/browse/HDFS-1456 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HDFS-1456.patch Time to fix a broken window. Of the 293 occurences of new MiniDFSCluster(... most look something like: {noformat}cluster = new MiniDFSCluster(0, config, numDatanodes, true, false, true, null, null, null, null);{noformat} The largest constructor takes 10 parameters, and even the overloaded constructors can be difficult to read as their mainaly nulls or booleans. We should provide a Builder for constructing MiniDFSClusters to improve readability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1456) Provide builder for constructing instances of MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920835#action_12920835 ] Philip Zeyliger commented on HDFS-1456: --- +1. I'm a big fan of this suggestion. One thing that might be interesting would be to cache MiniDFSClusters. I have a theory (not tested) that many of the tests would work just fine using the same cluster over and over again. If we had some sort of default cached instance, the tests might pass faster. Provide builder for constructing instances of MiniDFSCluster Key: HDFS-1456 URL: https://issues.apache.org/jira/browse/HDFS-1456 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HDFS-1456.patch Time to fix a broken window. Of the 293 occurences of new MiniDFSCluster(... most look something like: {noformat}cluster = new MiniDFSCluster(0, config, numDatanodes, true, false, true, null, null, null, null);{noformat} The largest constructor takes 10 parameters, and even the overloaded constructors can be difficult to read as their mainaly nulls or booleans. We should provide a Builder for constructing MiniDFSClusters to improve readability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.