Mike, This error is not related to malformed XML files etc you are trying to copy, but because for some reason, the source or destination listing can not be retrieved/parsed. Are you trying to copy between diff versions of clusters? As far as I know, your destination should be writable, distcp should be run from the destination cluster. See more here: http://hadoop.apache.org/common/docs/r0.20.2/distcp.html
Let us know how it goes. Thanks and Regards, Sonal <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases, Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho> Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <korb_mich...@bah.com>wrote: > I am running two instances of Hadoop on a cluster and want to copy all the > data from hadoop1 to the updated hadoop2. From hadoop2, I am running the > command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/" > where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of > hadoop2. I get the following error: > > 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/] > 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/ > [Fatal Error] :1:215: XML document structures must start and end within the > same entity. > With failures, global counters are inaccurate; consider running with -i > Copy failed: java.io.IOException: invalid xml directory content > at > org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350) > at > org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355) > at > org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384) > at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227) > at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120) > at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) > Caused by: org.xml.sax.SAXParseException: XML document structures must > start and end within the same entity. > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231) > at > org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344) > ... 9 more > > I am fairly certain that none of the XML files are malformed or corrupted. > This thread ( > http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html) > discusses a similar problem caused by file permissions but doesn't seem to > offer a solution. Any help would be appreciated. > > Thanks, > Mike >