[ https://issues.apache.org/jira/browse/HDFS-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437612#comment-16437612 ]
Erik Krogen commented on HDFS-12828: ------------------------------------ The issue comes from how the {{XMLEventReader}} processing entity references. The code assumed that a XML block like {{<name>foo&bar</name>}} would be parsed as a {{START_ELEMENT}}, a {{CHARACTERS}} with "foo&bar", and an {{END_ELEMENT}}. However what actually happens between the start/end element is three {{CHARACTERS}} blocks, "foo", "&", and "bar" (note that the entity reference has already been handled). So, remove the flag to process entity references, and support multiple contiguous {{CHARACTERS}} blocks. Attached a patch with the fix and supplementing existing unit tests. > OIV ReverseXML Processor Fails With Escaped Characters > ------------------------------------------------------ > > Key: HDFS-12828 > URL: https://issues.apache.org/jira/browse/HDFS-12828 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 2.8.0 > Reporter: Erik Krogen > Assignee: Erik Krogen > Priority: Major > Attachments: HDFS-12828.000.patch, fsimage_0000000000000000008.xml > > > The HDFS OIV ReverseXML processor fails if the XML file contains escaped > characters: > {code} > ekrogen at ekrogen-ld1 in > ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.0.0-beta1-SNAPSHOT on trunk! > ± $HADOOP_HOME/bin/hdfs dfs -fs hdfs://localhost:9000/ -ls / > Found 4 items > drwxr-xr-x - ekrogen supergroup 0 2017-11-16 14:48 /foo > drwxr-xr-x - ekrogen supergroup 0 2017-11-16 14:49 /foo" > drwxr-xr-x - ekrogen supergroup 0 2017-11-16 14:50 /foo` > drwxr-xr-x - ekrogen supergroup 0 2017-11-16 14:49 /foo& > {code} > Then after doing {{saveNamespace}} on that NameNode... > {code} > ekrogen at ekrogen-ld1 in > ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.0.0-beta1-SNAPSHOT on trunk! > ± $HADOOP_HOME/bin/hdfs oiv -i > /tmp/hadoop-ekrogen/dfs/name/current/fsimage_0000000000000000008 -o > /tmp/hadoop-ekrogen/dfs/name/current/fsimage_0000000000000000008.xml -p XML > ekrogen at ekrogen-ld1 in > ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.0.0-beta1-SNAPSHOT on trunk! > ± $HADOOP_HOME/bin/hdfs oiv -i > /tmp/hadoop-ekrogen/dfs/name/current/fsimage_0000000000000000008.xml -o > /tmp/hadoop-ekrogen/dfs/name/current/fsimage_0000000000000000008.xml.rev -p > ReverseXML > OfflineImageReconstructor failed: unterminated entity ref starting with & > org.apache.hadoop.hdfs.util.XMLUtils$UnmanglingError: unterminated entity ref > starting with & > at > org.apache.hadoop.hdfs.util.XMLUtils.unmangleXmlString(XMLUtils.java:232) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.loadNodeChildrenHelper(OfflineImageReconstructor.java:383) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.loadNodeChildrenHelper(OfflineImageReconstructor.java:379) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.loadNodeChildren(OfflineImageReconstructor.java:418) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.access$1000(OfflineImageReconstructor.java:95) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor$INodeSectionProcessor.process(OfflineImageReconstructor.java:524) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.processXml(OfflineImageReconstructor.java:1710) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.run(OfflineImageReconstructor.java:1765) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:191) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:134) > {code} > See attachments for relevant fsimage XML file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org