RE: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 1026034162" error

2016-01-18 Thread Zheng, Kai
Looks like a file it’s copying is ended unexpectedly. Maybe need to find out 
which file, check or read the file in other means to ensure it’s fine not being 
corrupt.

Regards,
Kai

From: Buntu Dev [mailto:buntu...@gmail.com]
Sent: Tuesday, January 19, 2016 5:46 AM
To: user@hadoop.apache.org
Subject: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 
1026034162" error

I'm using distcp with these options to copy a hdfs directory from one cluster 
to another:


hadoop distcp -prb -i -update -skipcrccheck -delete 
hftp://cluster1/user/hive/warehouse/dir1/ hdfs://cluster2/dir1/


I keep running into these errors related to EOF, what could be causing these 
errors and how to fix this:

~
Caused by: 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: 
java.io.IOException: Got EOF but currentPos = 240377856 < filelength = 
1026034162
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:289)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:257)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
~~


Also I'm using the '-i' to ignore and continue on failures but the distcp does 
retry 3 times and stops. Can anyone throw some light on what else could be 
going wrong.


Thanks!


Re: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 1026034162" error

2016-01-18 Thread Buntu Dev
Thanks Kai, but I checked the parqet file that was reported to have issues
and fsck says the file is healthy.



On Mon, Jan 18, 2016 at 7:09 PM, Zheng, Kai  wrote:

> Looks like a file it’s copying is ended unexpectedly. Maybe need to find
> out which file, check or read the file in other means to ensure it’s fine
> not being corrupt.
>
>
>
> Regards,
>
> Kai
>
>
>
> *From:* Buntu Dev [mailto:buntu...@gmail.com]
> *Sent:* Tuesday, January 19, 2016 5:46 AM
> *To:* user@hadoop.apache.org
> *Subject:* Distcp fails with "Got EOF but currentPos = 240377856 <
> filelength = 1026034162" error
>
>
>
> I'm using distcp with these options to copy a hdfs directory from one
> cluster to another:
>
>
>
> 
>
> hadoop distcp -prb -i -update -skipcrccheck -delete
> hftp://cluster1/user/hive/warehouse/dir1/ hdfs://cluster2/dir1/
>
> 
>
>
>
> I keep running into these errors related to EOF, what could be causing
> these errors and how to fix this:
>
>
>
> ~
>
> Caused by:
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException:
> java.io.IOException: Got EOF but currentPos = 240377856 < filelength =
> 1026034162
>
> at
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:289)
>
> at
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:257)
>
> at
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
>
> at
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
>
> at
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
>
> at
> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
>
> ... 11 more
>
> ~~
>
>
>
>
>
> Also I'm using the '-i' to ignore and continue on failures but the distcp
> does retry 3 times and stops. Can anyone throw some light on what else
> could be going wrong.
>
>
>
>
>
> Thanks!
>


RE: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 1026034162" error

2016-01-18 Thread Brahma Reddy Battula
Hi Buntu Dev,

Please check the Data node logs to get the exact root reason.
One more possible reason (apart from kai mentioned)can be direct buffer memory 
is not enough while copying the large files. If you observe the OOM in direct 
buffer, just increase it..

Hope it’s helpful.



From: Buntu Dev [mailto:buntu...@gmail.com]
Sent: 19 January 2016 09:15
To: Zheng, Kai
Cc: user@hadoop.apache.org
Subject: Re: Distcp fails with "Got EOF but currentPos = 240377856 < filelength 
= 1026034162" error

Thanks Kai, but I checked the parqet file that was reported to have issues and 
fsck says the file is healthy.



On Mon, Jan 18, 2016 at 7:09 PM, Zheng, Kai 
<kai.zh...@intel.com<mailto:kai.zh...@intel.com>> wrote:
Looks like a file it’s copying is ended unexpectedly. Maybe need to find out 
which file, check or read the file in other means to ensure it’s fine not being 
corrupt.

Regards,
Kai

From: Buntu Dev [mailto:buntu...@gmail.com<mailto:buntu...@gmail.com>]
Sent: Tuesday, January 19, 2016 5:46 AM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Distcp fails with "Got EOF but currentPos = 240377856 < filelength = 
1026034162" error

I'm using distcp with these options to copy a hdfs directory from one cluster 
to another:


hadoop distcp -prb -i -update -skipcrccheck -delete 
hftp://cluster1/user/hive/warehouse/dir1/ hdfs://cluster2/dir1/


I keep running into these errors related to EOF, what could be causing these 
errors and how to fix this:

~
Caused by: 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: 
java.io.IOException: Got EOF but currentPos = 240377856 < filelength = 
1026034162
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:289)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:257)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
~~


Also I'm using the '-i' to ignore and continue on failures but the distcp does 
retry 3 times and stops. Can anyone throw some light on what else could be 
going wrong.


Thanks!