[jira] [Created] (HDFS-14045) Use different metrics in DataNode to better measure latency of heartbeat/blockReports/incrementalBlockReports of Active/Standby NN

2018-11-01 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-14045:


 Summary: Use different metrics in DataNode to better measure 
latency of heartbeat/blockReports/incrementalBlockReports of Active/Standby NN
 Key: HDFS-14045
 URL: https://issues.apache.org/jira/browse/HDFS-14045
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Jiandan Yang 


Currently DataNode uses same metrics to measure rpc latency of NameNode, but 
Active and Standby usually have different performance at the same time, 
especially in large cluster. For example, rpc latency of Standby is very long 
when Standby is catching up editlog. We may misunderstand the state of HDFS. 
Using different metrics for Active and standby can help us obtain more precise 
metric data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13984) getFileInfo of libhdfs call NameNode#getFileStatus twice

2018-10-11 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-13984:


 Summary: getFileInfo of libhdfs call NameNode#getFileStatus twice
 Key: HDFS-13984
 URL: https://issues.apache.org/jira/browse/HDFS-13984
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: libhdfs
Reporter: Jiandan Yang 
Assignee: Jiandan Yang 


getFileInfo in hdfs.c calls *FileSystem#exists* first, then calls 
*FileSystem#getFileStatus*.  *FileSystem#exists* also call 
*FileSystem#getFileStatus, just as follows:
{code:java}
  public boolean exists(Path f) throws IOException {
try {
  return getFileStatus(f) != null;
} catch (FileNotFoundException e) {
  return false;
}
  }
{code}

and finally this leads to call NameNodeRpcServer#getFileInfo twice.
Actually we can implement by calling once.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13915) replace datanode failed because of NameNodeRpcServer#getAdditionalDatanode returning excessive datanodeInfo

2018-09-13 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-13915:


 Summary: replace datanode failed because of  
NameNodeRpcServer#getAdditionalDatanode returning excessive datanodeInfo
 Key: HDFS-13915
 URL: https://issues.apache.org/jira/browse/HDFS-13915
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
 Environment: 

Reporter: Jiandan Yang 
Assignee: Jiandan Yang 


Consider following situation:
1. create a file with ALLSSD policy

2. return [SSD,SSD,DISK] due to lack of SSD space

3. client call NameNodeRpcServer#getAdditionalDatanode when recovering write 
pipeline and replacing bad datanode

4. BlockPlacementPolicyDefault#chooseTarget will call 
StoragePolicy#chooseStorageTypes(3, [SSD,DISK], none, false), but 
chooseStorageTypes return [SSD,SSD]

5. do numOfReplicas = requiredStorageTypes.size() and numOfReplicas is set to 2 
and choose additional two datanodes

6. BlockPlacementPolicyDefault#chooseTarget return four datanodes to client

7. DataStreamer#findNewDatanode find nodes.length != original.length + 1  and 
throw IOException, and finally lead to write failed

client warn logs is:
 \{code:java}

WARN [DataStreamer for file 
/home/yarn/opensearch/in/data/120141286/0_65535/table/ucs_process/MANIFEST-093545
 block BP-1742758844-11.138.8.184-1483707043031:blk_7086344902_6012765313] 
org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception

java.io.IOException: Failed to replace a bad datanode on the existing pipeline 
due to no more good datanodes being available to try. (Nodes: 
current=[DatanodeInfoWithStorage[11.138.5.4:50010,DS-04826cfc-1885-4213-a58b-8606845c5c42,SSD],
 
DatanodeInfoWithStorage[11.138.5.9:50010,DS-f6d8eb8b-2550-474b-a692-c991d7a6f6b3,SSD],
 
DatanodeInfoWithStorage[11.138.5.153:50010,DS-f5d77ca0-6fe3-4523-8ca8-5af975f845b6,SSD],
 
DatanodeInfoWithStorage[11.138.9.156:50010,DS-0d15ea12-1bad--84f7-1a4917a1e194,DISK]],
 
original=[DatanodeInfoWithStorage[11.138.5.4:50010,DS-04826cfc-1885-4213-a58b-8606845c5c42,SSD],
 
DatanodeInfoWithStorage[11.138.9.156:50010,DS-0d15ea12-1bad--84f7-1a4917a1e194,DISK]]).
 The current failed datanode replacement policy is DEFAULT, and a client may 
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' 
in its configuration.

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12814) Add blockId when warning slow mirror/disk in BlockReceiver

2017-11-14 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12814:


 Summary: Add blockId when warning slow mirror/disk in BlockReceiver
 Key: HDFS-12814
 URL: https://issues.apache.org/jira/browse/HDFS-12814
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Jiandan Yang 
Assignee: Jiandan Yang 
Priority: Minor


HDFS-11603 add downstream DataNodeIds and volume path.
In order to better debug, those warnning log should include blockId



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12757) DeadLock Happened Between DFSOutputStream and LeaseRenewer when LeaseRenewer#renew SocketTimeException

2017-11-02 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12757:


 Summary: DeadLock Happened Between DFSOutputStream and 
LeaseRenewer when LeaseRenewer#renew SocketTimeException
 Key: HDFS-12757
 URL: https://issues.apache.org/jira/browse/HDFS-12757
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Jiandan Yang 
Priority: Major


Java stack is :
Found one Java-level deadlock:
=
"Topology-2 (735/2000)":
  waiting to lock monitor 0x7fff4523e6e8 (object 0x0005d3521078, a 
org.apache.hadoop.hdfs.client.impl.LeaseRenewer),
  which is held by "LeaseRenewer:admin@na61storage"
"LeaseRenewer:admin@na61storage":
  waiting to lock monitor 0x7fff5d41e838 (object 0x0005ec0dfa88, a 
org.apache.hadoop.hdfs.DFSOutputStream),
  which is held by "Topology-2 (735/2000)"

Java stack information for the threads listed above:
===
"Topology-2 (735/2000)":
at 
org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:227)
- waiting to lock <0x0005d3521078> (a 
org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
at 
org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:86)
at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:467)
at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:479)
at 
org.apache.hadoop.hdfs.DFSOutputStream.setClosed(DFSOutputStream.java:776)
at 
org.apache.hadoop.hdfs.DFSOutputStream.closeThreads(DFSOutputStream.java:791)
at 
org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:848)
- locked <0x0005ec0dfa88> (a org.apache.hadoop.hdfs.DFSOutputStream)
at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:805)
- locked <0x0005ec0dfa88> (a org.apache.hadoop.hdfs.DFSOutputStream)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
..
"LeaseRenewer:admin@na61storage":
at 
org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:750)
- waiting to lock <0x0005ec0dfa88> (a 
org.apache.hadoop.hdfs.DFSOutputStream)
at 
org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:586)
at 
org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:453)
- locked <0x0005d3521078> (a 
org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
at 
org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:76)
at 
org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:310)
at java.lang.Thread.run(Thread.java:834)

Found 1 deadlock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12748) Standby NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY

2017-10-30 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12748:


 Summary: Standby NameNode memory leak when accessing webhdfs 
GETHOMEDIRECTORY
 Key: HDFS-12748
 URL: https://issues.apache.org/jira/browse/HDFS-12748
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.2
Reporter: Jiandan Yang 


In our production environment, the standby NN often do fullgc, through mat we 
found the largest object is FileSystem$Cache, which contains 7,844,890 
DistributedFileSystem.
By view hierarchy of method FileSystem.get() , I found only 
NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating 
different DistributedFileSystem every time instead of get a FileSystem from 
cache.

{code:java}
case GETHOMEDIRECTORY: {
  final String js = JsonUtil.toJsonString("Path",
  FileSystem.get(conf != null ? conf : new Configuration())
  .getHomeDirectory().toUri().getPath());
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}
{code}
When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc.

{code:java}
case GETHOMEDIRECTORY: {
  FileSystem fs = null;
  try {
fs = FileSystem.get(conf != null ? conf : new Configuration());
final String js = JsonUtil.toJsonString("Path",
fs.getHomeDirectory().toUri().getPath());
return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
  } finally {
if (fs != null) {
  fs.close();
}
  }
}
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12638) NameNode exit due to NPE

2017-10-11 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12638:


 Summary: NameNode exit due to NPE
 Key: HDFS-12638
 URL: https://issues.apache.org/jira/browse/HDFS-12638
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.2
Reporter: Jiandan Yang 


Active NamNode exit due to NPE, and I think BlockCollection 'bc' is Null by 
exclusion, but I do not know why bc is Null, By view history I guess this issue 
may be involved by [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754]. 

NN logs are as following:
{code:java}
2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: ReplicationMonitor 
thread received Runtime exception.
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744)
at java.lang.Thread.run(Thread.java:834)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12446) FSNamesystem#internalReleaseLease throw IllegalStateException

2017-09-13 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12446:


 Summary: FSNamesystem#internalReleaseLease throw 
IllegalStateException
 Key: HDFS-12446
 URL: https://issues.apache.org/jira/browse/HDFS-12446
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.1
Reporter: Jiandan Yang 



2017-09-14 10:21:32,042 INFO 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
DFSClient_NONMAPREDUCE_-275421369_84, pending creates: 7] has expired hard limit
2017-09-14 10:21:32,042 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
Holder: DFSClient_NONMAPREDUCE_-275421369_84, pending creates: 7], 
src=/user/ads/af_base_n_adf_p4p_pv/data/55f57d72-1542-4acf-b2d4-08af65b0e859
2017-09-14 10:21:32,042 WARN 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable:
java.lang.IllegalStateException: Unexpected block state: 
blk_1265519060_203004758 is COMMITTED but not COMPLETE, 
file=55f57d72-1542-4acf-b2d4-08af65b0e859 (INodeFile), 
blocks=[blk_1265519060_203004758] (i=0)
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:172)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.assertAllBlocksComplete(INodeFile.java:218)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.toCompleteFile(INodeFile.java:207)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.finalizeINodeFileUnderConstruction(FSNamesystem.java:3312)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3184)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)
at java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12390) Supporting DNS to switch mapping

2017-09-03 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12390:


 Summary: Supporting DNS to switch mapping
 Key: HDFS-12390
 URL: https://issues.apache.org/jira/browse/HDFS-12390
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs, hdfs-client
Reporter: Jiandan Yang 
Assignee: Jiandan Yang 


As described in [HDFS-12200|https://issues.apache.org/jira/browse/HDFS-12200], 
ScriptBasedMapping may lead to NN cpu 100%. ScriptBasedMapping run 
sub_processor to get rack info of DN/Client, so we think  it's a little heavy.  
We prepare to use TableMapping,but  TableMapping does not support refresh and 
can not reload rack info of newly added DataNodes.
So we implement it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12364) Compile Error:TestClientProtocolForPipelineRecovery#testUpdatePipeLineAfterDNReg

2017-08-28 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12364:


 Summary: Compile 
Error:TestClientProtocolForPipelineRecovery#testUpdatePipeLineAfterDNReg
 Key: HDFS-12364
 URL: https://issues.apache.org/jira/browse/HDFS-12364
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.2
Reporter: Jiandan Yang 
Assignee: Jiandan Yang 


error line :dn1.setHeartbeatsDisabledForTests(true) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12348) disable removing block to trash while rolling upgrade

2017-08-23 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12348:


 Summary: disable removing block to trash while rolling upgrade
 Key: HDFS-12348
 URL: https://issues.apache.org/jira/browse/HDFS-12348
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Reporter: Jiandan Yang 
Assignee: Jiandan Yang 


DataNode remove block file and meta file to trash while rolling upgrade,and do 
delete when
executing finalize. But frequently creating and deleting files leads to disk to 
be full(eg,Hbase compaction), and we will not rollbacking namespace in 
production when rolling upgrade fail. Disable trash of datanode maybe a good 
method to avoid disk to be full.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid cpu utilization is too high

2017-07-26 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12200:


 Summary: Optimize CachedDNSToSwitchMapping to avoid cpu 
utilization is too high
 Key: HDFS-12200
 URL: https://issues.apache.org/jira/browse/HDFS-12200
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Jiandan Yang 


1. Background :
Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to 
600+ machines, YARN is deployed to another machine pool where off-line job and 
online service are run, Yarn's offline job will visit HDFS, but points The 
machines used for offline jobs are dynamically changing because the online 
service has a higher priority, and when the online service is idle, the machine 
will be assigned to offline tasks, and when the online service is busy, it will 
seize the resources of the offline job.
We found that sometimes NameNode cpu utilization rate of 90% or even 100%. The 
most serious is cpu utilization rate of 100% for a long time result in writing 
journalNode timeout, eventually leading to NameNode hang up. The reason is  
offline tasks running in a few hundred servers access HDFS at the same time, 
NameNode resolve rack of client machine, started several hundred sub-process. 

{code:java}
"process reaper"#10864 daemon prio=10 os_prio=0 tid=0x7fe270a31800 
nid=0x38d93 runnable [0x7fcdc36fc000]
   java.lang.Thread.State: RUNNABLE
at java.lang.UNIXProcess.waitForProcessExit(Native Method)
at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301)
at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:834
{code}

Our configuration as follows:
{code:java}
net.topology.node.switch.mapping.impl = ScriptBasedMapping, 
net.topology.script.file.name = 'a python script'
{code}



2. Optimization
In order to solve these two problems, we have optimized the 
CachedDNSToSwitchMapping
(1) Added the DataNode IP list  to the file of  dfs.hosts configured. when 
NameNode starts it  preloads DataNode rack information to the cache, get a 
batch of racks of hosts when running script once (the corresponding 
configuration is net.topology.script.number,the default value of 100)

(2) Step (1) has ensured that the cache has all the DataNodes’ rack,  so if the 
cache did not hit, then the host must be a client machine, then directly return 
/default-rack,

(3) Each time you add new DataNodes you need to add the new DataNodes’ IP 
address to the file specified by dfs.hosts, and then run command of bin/hdfs 
dfsadmin -refreshNodes, it will put the newly added DataNodes’ rack into cache
(4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host, 
the value is false to open the above function, and the value is true to turn 
off the above functions, default value is true to keep compatibility




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-12177) NameNode exits due to setting BlockPlacementPolicy loglevel to Debug

2017-07-20 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  reopened HDFS-12177:
--

> NameNode exits due to  setting BlockPlacementPolicy loglevel to Debug
> -
>
> Key: HDFS-12177
> URL: https://issues.apache.org/jira/browse/HDFS-12177
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.8.1
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
> Attachments: HDFS_9668_1.patch
>
>
> NameNode exits because the ReplicationMonitor thread internally throws NPE.
> The reason for throwing NPE is that the builder field is not initialized whe 
> do log.
> Solution: before appending it should determine whether the builder is null
> {code:java}
> if (LOG.isDebugEnabled()) {
>   builder = debugLoggingBuilder.get();
>   builder.setLength(0);
>   builder.append("[");
> }
> some other codes ...
> if (LOG.isDebugEnabled()) {
>   builder.append("\nNode ").append(NodeBase.getPath(chosenNode))
>   .append(" [");
> }
> some other codes ...
> if (LOG.isDebugEnabled()) {
>   builder.append("\n]");
> }
> {code}
> NN exception log is :
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:722)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:689)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseFromNextRack(BlockPlacementPolicyDefault.java:640)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:608)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:483)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:390)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:419)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:266)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:119)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3768)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3720)
> at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12177) NameNode exits due to setting BlockPlacementPolicy loglevel to Debug

2017-07-20 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12177:


 Summary: NameNode exits due to  setting BlockPlacementPolicy 
loglevel to Debug
 Key: HDFS-12177
 URL: https://issues.apache.org/jira/browse/HDFS-12177
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: block placement
Affects Versions: 2.8.1
Reporter: Jiandan Yang 


NameNode exits because the ReplicationMonitor thread internally throws NPE.
The reason for throwing NPE is that the builder field is not initialized whe do 
log.
Solution: before appending it should determine whether the builder is null

{code:java}
if (LOG.isDebugEnabled()) {
  builder = debugLoggingBuilder.get();
  builder.setLength(0);
  builder.append("[");
}
some other codes ...
if (LOG.isDebugEnabled()) {
  builder.append("\nNode ").append(NodeBase.getPath(chosenNode))
  .append(" [");
}
some other codes ...
if (LOG.isDebugEnabled()) {
  builder.append("\n]");
}
{code}

NN exception log is :

{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:722)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:689)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseFromNextRack(BlockPlacementPolicyDefault.java:640)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:608)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:483)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:390)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:419)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:266)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:119)
at 
org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3768)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3720)
at java.lang.Thread.run(Thread.java:834)

{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org