RE: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 1026034162" error
Looks like a file it’s copying is ended unexpectedly. Maybe need to find out which file, check or read the file in other means to ensure it’s fine not being corrupt. Regards, Kai From: Buntu Dev [mailto:buntu...@gmail.com] Sent: Tuesday, January 19, 2016 5:46 AM To: user@hadoop.apache.org Subject: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 1026034162" error I'm using distcp with these options to copy a hdfs directory from one cluster to another: hadoop distcp -prb -i -update -skipcrccheck -delete hftp://cluster1/user/hive/warehouse/dir1/ hdfs://cluster2/dir1/ I keep running into these errors related to EOF, what could be causing these errors and how to fix this: ~ Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.IOException: Got EOF but currentPos = 240377856 < filelength = 1026034162 at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:289) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:257) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more ~~ Also I'm using the '-i' to ignore and continue on failures but the distcp does retry 3 times and stops. Can anyone throw some light on what else could be going wrong. Thanks!
Re: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 1026034162" error
Thanks Kai, but I checked the parqet file that was reported to have issues and fsck says the file is healthy. On Mon, Jan 18, 2016 at 7:09 PM, Zheng, Kaiwrote: > Looks like a file it’s copying is ended unexpectedly. Maybe need to find > out which file, check or read the file in other means to ensure it’s fine > not being corrupt. > > > > Regards, > > Kai > > > > *From:* Buntu Dev [mailto:buntu...@gmail.com] > *Sent:* Tuesday, January 19, 2016 5:46 AM > *To:* user@hadoop.apache.org > *Subject:* Distcp fails with "Got EOF but currentPos = 240377856 < > filelength = 1026034162" error > > > > I'm using distcp with these options to copy a hdfs directory from one > cluster to another: > > > > > > hadoop distcp -prb -i -update -skipcrccheck -delete > hftp://cluster1/user/hive/warehouse/dir1/ hdfs://cluster2/dir1/ > > > > > > I keep running into these errors related to EOF, what could be causing > these errors and how to fix this: > > > > ~ > > Caused by: > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: > java.io.IOException: Got EOF but currentPos = 240377856 < filelength = > 1026034162 > > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:289) > > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:257) > > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184) > > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124) > > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100) > > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) > > ... 11 more > > ~~ > > > > > > Also I'm using the '-i' to ignore and continue on failures but the distcp > does retry 3 times and stops. Can anyone throw some light on what else > could be going wrong. > > > > > > Thanks! >
RE: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 1026034162" error
Hi Buntu Dev, Please check the Data node logs to get the exact root reason. One more possible reason (apart from kai mentioned)can be direct buffer memory is not enough while copying the large files. If you observe the OOM in direct buffer, just increase it.. Hope it’s helpful. From: Buntu Dev [mailto:buntu...@gmail.com] Sent: 19 January 2016 09:15 To: Zheng, Kai Cc: user@hadoop.apache.org Subject: Re: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 1026034162" error Thanks Kai, but I checked the parqet file that was reported to have issues and fsck says the file is healthy. On Mon, Jan 18, 2016 at 7:09 PM, Zheng, Kai <kai.zh...@intel.com<mailto:kai.zh...@intel.com>> wrote: Looks like a file it’s copying is ended unexpectedly. Maybe need to find out which file, check or read the file in other means to ensure it’s fine not being corrupt. Regards, Kai From: Buntu Dev [mailto:buntu...@gmail.com<mailto:buntu...@gmail.com>] Sent: Tuesday, January 19, 2016 5:46 AM To: user@hadoop.apache.org<mailto:user@hadoop.apache.org> Subject: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 1026034162" error I'm using distcp with these options to copy a hdfs directory from one cluster to another: hadoop distcp -prb -i -update -skipcrccheck -delete hftp://cluster1/user/hive/warehouse/dir1/ hdfs://cluster2/dir1/ I keep running into these errors related to EOF, what could be causing these errors and how to fix this: ~ Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.IOException: Got EOF but currentPos = 240377856 < filelength = 1026034162 at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:289) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:257) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more ~~ Also I'm using the '-i' to ignore and continue on failures but the distcp does retry 3 times and stops. Can anyone throw some light on what else could be going wrong. Thanks!