[ 
https://issues.apache.org/jira/browse/HDFS-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437612#comment-16437612
 ] 

Erik Krogen commented on HDFS-12828:
------------------------------------

The issue comes from how the {{XMLEventReader}} processing entity references. 
The code assumed that a XML block like {{<name>foo&amp;bar</name>}} would be 
parsed as a {{START_ELEMENT}}, a {{CHARACTERS}} with "foo&amp;bar", and an 
{{END_ELEMENT}}. However what actually happens between the start/end element is 
three {{CHARACTERS}} blocks, "foo", "&", and "bar" (note that the entity 
reference has already been handled). So, remove the flag to process entity 
references, and support multiple contiguous {{CHARACTERS}} blocks.

Attached a patch with the fix and supplementing existing unit tests.

> OIV ReverseXML Processor Fails With Escaped Characters
> ------------------------------------------------------
>
>                 Key: HDFS-12828
>                 URL: https://issues.apache.org/jira/browse/HDFS-12828
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.8.0
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>         Attachments: HDFS-12828.000.patch, fsimage_0000000000000000008.xml
>
>
> The HDFS OIV ReverseXML processor fails if the XML file contains escaped 
> characters:
> {code}
> ekrogen at ekrogen-ld1 in 
> ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.0.0-beta1-SNAPSHOT on trunk!
> ± $HADOOP_HOME/bin/hdfs dfs -fs hdfs://localhost:9000/ -ls /
> Found 4 items
> drwxr-xr-x   - ekrogen supergroup          0 2017-11-16 14:48 /foo
> drwxr-xr-x   - ekrogen supergroup          0 2017-11-16 14:49 /foo"
> drwxr-xr-x   - ekrogen supergroup          0 2017-11-16 14:50 /foo`
> drwxr-xr-x   - ekrogen supergroup          0 2017-11-16 14:49 /foo&
> {code}
> Then after doing {{saveNamespace}} on that NameNode...
> {code}
> ekrogen at ekrogen-ld1 in 
> ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.0.0-beta1-SNAPSHOT on trunk!
> ± $HADOOP_HOME/bin/hdfs oiv -i 
> /tmp/hadoop-ekrogen/dfs/name/current/fsimage_0000000000000000008 -o 
> /tmp/hadoop-ekrogen/dfs/name/current/fsimage_0000000000000000008.xml -p XML
> ekrogen at ekrogen-ld1 in 
> ~/dev/hadoop/trunk/hadoop-dist/target/hadoop-3.0.0-beta1-SNAPSHOT on trunk!
> ± $HADOOP_HOME/bin/hdfs oiv -i 
> /tmp/hadoop-ekrogen/dfs/name/current/fsimage_0000000000000000008.xml -o 
> /tmp/hadoop-ekrogen/dfs/name/current/fsimage_0000000000000000008.xml.rev -p 
> ReverseXML
> OfflineImageReconstructor failed: unterminated entity ref starting with &
> org.apache.hadoop.hdfs.util.XMLUtils$UnmanglingError: unterminated entity ref 
> starting with &
>         at 
> org.apache.hadoop.hdfs.util.XMLUtils.unmangleXmlString(XMLUtils.java:232)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.loadNodeChildrenHelper(OfflineImageReconstructor.java:383)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.loadNodeChildrenHelper(OfflineImageReconstructor.java:379)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.loadNodeChildren(OfflineImageReconstructor.java:418)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.access$1000(OfflineImageReconstructor.java:95)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor$INodeSectionProcessor.process(OfflineImageReconstructor.java:524)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.processXml(OfflineImageReconstructor.java:1710)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.run(OfflineImageReconstructor.java:1765)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:191)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:134)
> {code}
> See attachments for relevant fsimage XML file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to