RE: Mismatch in length of source:

2017-07-03 Thread Brahma Reddy Battula
Distcp can be success with snapshot, but open files length can be zero..? see 
HDFS-11402

AFAIK, if you know the open files you can call recoverlease or wait for 
hardlimit (let Namenode trigger lease recovery). 

i) Get the list of open files

e.g hdfs fsck -openforwrite / -files -blocks -locations | grep -i 
"OPENFORWRITE:"

ii) call recoverylease on each open files

e.g hdfs debug recoverlease

Note: Service like HBase where RS will keep open WAL files, better stop HBase 
service which can automatically close the file.

iii) and then go for distcp



Bytheway,HDFS-10480 gives list of open files.




--Brahma Reddy Battula

-Original Message-
From: Ulul [mailto:had...@ulul.org] 
Sent: 02 January 2017 23:05
To: user@hadoop.apache.org
Subject: Re: Mismatch in length of source:

Hi

I can't remember the exact error message but distcp consistently fails when 
trying to copy open files. Is it your case ?

Workaround it to snapshot prior to copying

Ulul


On 31/12/2016 19:25, Aditya exalter wrote:
> Hi All,
>   A very happy new year to ALL.
>
>   I am facing issue while executing distcp between two different 
> clusters,
>
> Caused by: java.io.IOException: Mismatch in length of 
> source:hdfs://ip1/xx/x and
> target:hdfs://nameservice1/xx/.distcp.tmp.attempt_1483200922993_00
> 56_m_11_2
>
> I tried using -pb and -skipcrccheck
>
>  hadoop distcp -pb -skipcrccheck -update hdfs://ip1/xx/x 
> hdfs:////
>
> hadoop distcp -pb  hdfs://ip1/xx/x hdfs:////
>
> hadoop distcp -skipcrccheck -update
> hdfs://ip1/xx/x hdfs:////
>
>
> but nothing seems to be working .Any solutions please.
>
>
> Regards,
> Aditya.


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org


RE: java.io.IOException on Namenode logs

2017-07-03 Thread Brahma Reddy Battula
Hi Nishant Verma

It will be great, if you mention which version of Hadoop you are using.

Apart from your findings(even I appreciate) and daemeon mentioned, you can 
check following also.


1)  Non-dfs used is more(you can check in namenodeUI/adminreport/jmx)

2)  Scheduled blocks are more(you can check jmx)

If there is any possibility enable the debug logs which can give useful info.


--Brahma Reddy Battula

From: daemeon reiydelle [mailto:daeme...@gmail.com]
Sent: 04 July 2017 01:04
To: Nishant Verma
Cc: user
Subject: Re: java.io.IOException on Namenode logs

A possibility is that the node showing errors was not able to get tcp 
connection, or heavy network conjestion, or (possibly) heavy garbage collection 
tomeouts. Would suspect network
...
There is no sin except stupidity - Oscar Wilde
...
Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198

On Jul 3, 2017 12:27 AM, "Nishant Verma" 
> wrote:
Hello

I am having Kafka Connect writing records on my HDFS nodes. HDFS cluster has 3 
datanodes. Last night I observed data loss in records committed to HDFS. There 
was no issue on Kafka Connect side. However, I can see Namenode showing below 
error logs:

java.io.IOException: File 
/topics/+tmp/testTopic/year=2017/month=07/day=03/hour=03/8237cfb7-2b3d-4d5c-ab04-924c0f647cd6_tmp
 could only be replicated to 0 nodes instead of minReplication (=1).  There are 
3 datanode(s) running and no node(s) are excluded in this operation.
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1571)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed 
to place enough replicas, still in need of 3 to reach 3 
(unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
newBlock=true) For more information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy


Before occurence of every such line, we see below line:
2017-07-02 23:33:43,255 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 
on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
10.1.2.3:4982 Call#274492 Retry#0

10.1.2.3 is one of the Kafka Connect nodes.


I checked below things:

- There is no disk issue on datanodes. There is 110 GB space left in each 
datanode.
- In dfsadmin report, there are 3 live datanodes showing.
- dfs.datanode.du.reserved is used as its default value i.e. 0
- dfs.replication is set as 3.
- dfs.datanode.handler.count is used as its default value i.e. 10.
- dfs.datanode.data.dir.perm is used as its default value i.e. 700. But single 
user is used everywhere. So permission issue would not be there. Also, it did 
give accurate result for 22 hours and happened after 22nd hour.
- Could not find any error occurrence for this timestamp in datanode logs.
- The path where dfs.data.dir points has 64% space available on disk.

What could be the cause of this error and how to fix this? Why is it saying the 
file could only be replicated to 0 nodes when it also says there are 3 
datanodes available?

Thanks
Nishant



RE: Unsubscribe

2017-07-03 Thread Brahma Reddy Battula


It doesn't work like that. Kindly drop a mail to 
"user-unsubscr...@hadoop.apache.org"



--Brahma Reddy Battula

From: Atul Rajan [mailto:atul.raja...@gmail.com]
Sent: 03 July 2017 15:19
To: Donald Nelson
Cc: user@hadoop.apache.org
Subject: Re: Unsubscribe


Unsubscribe

On 3 July 2017 at 12:39, Donald Nelson 
> wrote:

unsubscribe

On 07/03/2017 09:08 AM, nfs_ nfs wrote:

Unsubscribe




--
Best Regards
Atul Rajan


Re: java.io.IOException on Namenode logs

2017-07-03 Thread daemeon reiydelle
A possibility is that the node showing errors was not able to get tcp
connection, or heavy network conjestion, or (possibly) heavy garbage
collection tomeouts. Would suspect network

...
There is no sin except stupidity - Oscar Wilde
...
Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198

On Jul 3, 2017 12:27 AM, "Nishant Verma" 
wrote:

> Hello
>
> I am having Kafka Connect writing records on my HDFS nodes. HDFS cluster
> has 3 datanodes. Last night I observed data loss in records committed to
> HDFS. There was no issue on Kafka Connect side. However, I can see Namenode
> showing below error logs:
>
> java.io.IOException: File /topics/+tmp/testTopic/year=
> 2017/month=07/day=03/hour=03/8237cfb7-2b3d-4d5c-ab04-924c0f647cd6_tmp
> could only be replicated to 0 nodes instead of minReplication (=1).  There
> are 3 datanode(s) running and no node(s) are excluded in this operation.
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.
> chooseTarget4NewBlock(BlockManager.java:1571)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> getNewBlockTargets(FSNamesystem.java:3107)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> getAdditionalBlock(FSNamesystem.java:3031)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> addBlock(NameNodeRpcServer.java:725)
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.addBlock(
> ClientNamenodeProtocolServerSideTranslatorPB.java:492)
> at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1698)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
> Failed to place enough replicas, still in need of 3 to reach 3
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
>
>
> Before occurence of every such line, we see below line:
> 2017-07-02 23:33:43,255 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 9000, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock
> from 10.1.2.3:4982 Call#274492 Retry#0
>
> 10.1.2.3 is one of the Kafka Connect nodes.
>
>
> I checked below things:
>
> - There is no disk issue on datanodes. There is 110 GB space left in each
> datanode.
> - In dfsadmin report, there are 3 live datanodes showing.
> - dfs.datanode.du.reserved is used as its default value i.e. 0
> - dfs.replication is set as 3.
> - dfs.datanode.handler.count is used as its default value i.e. 10.
> - dfs.datanode.data.dir.perm is used as its default value i.e. 700. But
> single user is used everywhere. So permission issue would not be there.
> Also, it did give accurate result for 22 hours and happened after 22nd hour.
> - Could not find any error occurrence for this timestamp in datanode logs.
> - The path where dfs.data.dir points has 64% space available on disk.
>
> What could be the cause of this error and how to fix this? Why is it
> saying the file could only be replicated to 0 nodes when it also says there
> are 3 datanodes available?
>
> Thanks
> Nishant
>
>


java.io.IOException on Namenode logs

2017-07-03 Thread Nishant Verma
Hello

I am having Kafka Connect writing records on my HDFS nodes. HDFS cluster
has 3 datanodes. Last night I observed data loss in records committed to
HDFS. There was no issue on Kafka Connect side. However, I can see Namenode
showing below error logs:

java.io.IOException: File
/topics/+tmp/testTopic/year=2017/month=07/day=03/hour=03/8237cfb7-2b3d-4d5c-ab04-924c0f647cd6_tmp
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 3 datanode(s) running and no node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1571)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
Failed to place enough replicas, still in need of 3 to reach 3
(unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
newBlock=true) For more information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy


Before occurence of every such line, we see below line:
2017-07-02 23:33:43,255 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 9000, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.1.2.3:4982
Call#274492 Retry#0

10.1.2.3 is one of the Kafka Connect nodes.


I checked below things:

- There is no disk issue on datanodes. There is 110 GB space left in each
datanode.
- In dfsadmin report, there are 3 live datanodes showing.
- dfs.datanode.du.reserved is used as its default value i.e. 0
- dfs.replication is set as 3.
- dfs.datanode.handler.count is used as its default value i.e. 10.
- dfs.datanode.data.dir.perm is used as its default value i.e. 700. But
single user is used everywhere. So permission issue would not be there.
Also, it did give accurate result for 22 hours and happened after 22nd hour.
- Could not find any error occurrence for this timestamp in datanode logs.
- The path where dfs.data.dir points has 64% space available on disk.

What could be the cause of this error and how to fix this? Why is it saying
the file could only be replicated to 0 nodes when it also says there are 3
datanodes available?

Thanks
Nishant


Re: Unsubscribe

2017-07-03 Thread Atul Rajan
Unsubscribe


On 3 July 2017 at 12:39, Donald Nelson  wrote:

> unsubscribe
>
> On 07/03/2017 09:08 AM, nfs_ nfs wrote:
>
> Unsubscribe
>
>
>


-- 
*Best Regards*
*Atul Rajan*


Re: Unsubscribe

2017-07-03 Thread Donald Nelson

unsubscribe


On 07/03/2017 09:08 AM, nfs_ nfs wrote:


Unsubscribe





Unsubscribe

2017-07-03 Thread nfs_ nfs
Unsubscribe