[jira] [Updated] (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-1448: --- Attachment: CDH-4355.txt This augments OEV to print out the highest generation stamp as part of its stats collection pass. Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.23.0 Attachments: CDH-4355.txt, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22-5.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf, editsStored Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1448: - Hadoop Flags: [Reviewed] +1 patch looks good. Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.23.0 Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22-5.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Steffl updated HDFS-1448: -- Release Note: Offline edits viewer feature adds oev tool to hdfs script. Oev makes it possible to convert edits logs to/from native binary and XML formats. It uses the same framework as Offline image viewer. Example usage: $HADOOP_HOME/bin/hdfs oev -i edits -o output.xml Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.23.0 Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22-5.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1448: -- Fix Version/s: (was: 0.22.0) 0.23.0 Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.23.0 Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22-5.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Steffl updated HDFS-1448: -- Attachment: HDFS-1448-0.22-5.patch HDFS-1448-0.22-5.patch fixes findBugs warnings, test-patch results: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 16 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). The release audit warning is caused by editsStored.xml, since it's a reference file for test it does not have apache license included. As far as I can tell this is acceptable since it's a file that must be exactly same as expected output of the Offline edits viewer. Let me know if there's a (better) standard way to deal with this scenario. Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22-5.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Steffl updated HDFS-1448: -- Attachment: HDFS-1448-0.22-4.patch Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Steffl updated HDFS-1448: -- Attachment: HDFS-1448-0.22-2.patch Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Steffl updated HDFS-1448: -- Attachment: HDFS-1448-0.22-1.patch Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1448: -- Attachment: Viewer hierarchy.pdf Code review: * General: ** All classes should be categorized with audience and stability ** No need for all the brackets in messages. Breaks with what passes for our style. ** Do we need to write to disk for tests? Just write to output stream * editsStored.xml ** Indent/format to make more human readable * TestOfflineImageViewer.java ** Convert getBuildDir and getCacheDir to fields, rather than re-evaluating the method each call ** Would it be better to split the single, large test into four smaller tests, with more descriptive names? ** The commented-out code should be removed. If it's useful for manual testing, it can be included in a static main in the test ** Style: runOev method calls don't follow code convention, they can be all on one line ** The methods runOevXmlToBinary/runOevBinaryToXml can be refactored to remove common code, which is most of it. ** There is no need for a separate printToScreen variable in those methods ** fileEqualIgnoreTrailingZeroes: Since largeFilename is just aliased to filename1, there is no need for filename1. Just use that name as the method parameter. ** loadFile(): I'm surprised we don't have a utility method in the test package to do this. It's a general operation and this method may be better located there. ** A larger problem is that this test doesn't use asserts to verify correctness, which will make working with it difficult. The exceptions should be converted to fully described JUnit asserts. * OfflineEditsViewerHelper.java ** Class needs Javadoc ** Is it necessary to copy the edits file? Instead, can we just leave it in place and test it there? A better option, though I don't believe supported by MiniDFSCluster, would be if we could just write the edits to a memory buffer and avoid the disk altogether. ** Commented out code: fc = FileContext.getFileContext(cluster.getURI(), config); * Tokenizer.java ** Tokenizer works specifically with EditsElements, may be good to give it more specific name. Same comment for Token. ** I'm torn on the individual Token* classes. I'd rather if there were a way of directly integrating them into edits, but that's a bridge too far for this patch. Scala case classes would be quite helpful here... ** Several referances to static method encode/decodeBase64 via instance variable *EditsLoaderCurrent.java ** The duplicated edits enums should be re-factored into shared class rather than duplicated. ** Style: case OP_CLOSE doesn't need to be surrounded by braces, as do several other cases. ** The more involved cases should be refactored into separate classes to aid readability. This may be reasonable for all the cases to be consistent. ** OP_UPDATE_MASTER_KEY this seems to be the only place we check for the possibility of working with an unsupported version. Is there a reason for this? ** The pattern: {noformat}v.visit(t.read(new Tokenizer.Token{Whatever}(EditsElement.LENGTH)));{noformat} is repeated quite a lot. Can this be refactored into a helper method to aid in readability? ** By doing a static import of the various Tokenizer classes (which can be made static) such as: {noformat}import static org.apache.hadoop.hdfs.tools.offlineEditsViewer.Tokenizer.TokenInt;{noformat} you can avoid the extra reference to Tokenizer in the visit calls. ** I'm not sure that the statistics functionality adds any value to this class. It may be better to create a separate statistics viewer that provides this information. ** Several unnecessary imports * EditsVisitor.java ** The DepthCounter duplicates the same class in the oiv. May as well create a common utility class and share it. ** Commented out code: {noformat} // abstract void visit(EditsElement element, String value) throws IOException; {noformat} ** Unnecessary import of DeprecatedUTF8 * EditsVisitorXml.java ** Consistent naming with oiv would be XmlEditsVisitor ** I believe this class is quite ripe for a shared generic implementation with oiv's Xml viewer. This is discussed more below. ** unnecessary import of Base64 class * OfflineEditsViewer.java ** Typo: This class implements and offline edits viewer, tool that (and - an) ** No need to mention note about OfflineImageViewer. ** The command line parsing and options shares quite a bit of code with the oiv and may be easy to merge. * EditsVisitorBinary.java ** The printToScreen option is ignored and doesn't make sense for this viewer. It may be fine to keep the option, but we should probably add documentation about it being ignored by some visitors ** No need for commented-out debugging code * Tokenizers.java ** Since the class is a factory perhaps TokenizerFactory is a better name? ** The file type determination can be simplified by checking for .endsWith(.xml) ** Typo:*
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1448: -- Priority: Major (was: Minor) Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Steffl updated HDFS-1448: -- Attachment: editsStored HDFS-1448-0.22.patch Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Priority: Minor Fix For: 0.22.0 Attachments: editsStored, HDFS-1448-0.22.patch Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.