Mike, I've seen this when a directory has been removed or is missing from the time distcp starting stating the source files. You'll probably want to make sure that no code or person is messing with the filesystem during your copy. I would make sure you only have one version of hadoop installed on your destination cluster. Also you should use hdfs as the destination protocol and run the command as the hdfs user if you're using hadoop security.
Example (Running on destination cluster): sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:8020/ Cheers, -Xavier On 2/7/11 9:39 AM, Korb, Michael [USA] wrote: > I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp > Guide but I think I know the problem. I'm trying to run the command on the > destination cluster, but when I call hadoop, I think the path is set to run > the hadoop1 executable. So I tried going to the hadoop2 install and running > it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" > but now I get this error: > > 11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/] > 11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/ > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials; > at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632) > at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) > > > ________________________________________ > From: Sonal Goyal [[email protected]] > Sent: Monday, February 07, 2011 12:11 PM > To: [email protected] > Subject: Re: Hadoop XML Error > > Mike, > > This error is not related to malformed XML files etc you are trying to copy, > but because for some reason, the source or destination listing can not be > retrieved/parsed. Are you trying to copy between diff versions of clusters? > As far as I know, your destination should be writable, distcp should be run > from the destination cluster. See more here: > http://hadoop.apache.org/common/docs/r0.20.2/distcp.html > > Let us know how it goes. > > Thanks and Regards, > Sonal > <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases, > Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho> > Nube Technologies <http://www.nubetech.co> > > <http://in.linkedin.com/in/sonalgoyal> > > > > > > On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] > <[email protected]>wrote: > >> I am running two instances of Hadoop on a cluster and want to copy all the >> data from hadoop1 to the updated hadoop2. From hadoop2, I am running the >> command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/" >> where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of >> hadoop2. I get the following error: >> >> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/] >> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/ >> [Fatal Error] :1:215: XML document structures must start and end within the >> same entity. >> With failures, global counters are inaccurate; consider running with -i >> Copy failed: java.io.IOException: invalid xml directory content >> at >> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350) >> at >> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355) >> at >> org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384) >> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227) >> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120) >> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666) >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) >> Caused by: org.xml.sax.SAXParseException: XML document structures must >> start and end within the same entity. >> at >> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231) >> at >> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344) >> ... 9 more >> >> I am fairly certain that none of the XML files are malformed or corrupted. >> This thread ( >> http://www.mail-archive.com/[email protected]/msg18064.html) >> discusses a similar problem caused by file permissions but doesn't seem to >> offer a solution. Any help would be appreciated. >> >> Thanks, >> Mike >>
