[jira] [Created] (HDFS-13832) EC: No administrative command provided to delete an user-defined erasure coding policy

2018-08-17 Thread Souryakanta Dwivedy (JIRA)
Souryakanta Dwivedy created HDFS-13832:
--

 Summary: EC: No administrative command provided to delete an 
user-defined erasure coding policy
 Key: HDFS-13832
 URL: https://issues.apache.org/jira/browse/HDFS-13832
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Affects Versions: 3.0.0
 Environment: 3 node SUSE linux cluster
Reporter: Souryakanta Dwivedy
 Attachments: Delete_ec_policy.PNG

No administrative command provided to delete an user defined erasure coding 
policy 

Step : -

---
 * Create a Directory
 - Add 64 user-defined ec policies in the ID range of [64 to 127].Beyond that 
system will not allow 
 to add any more policy.
 - Enable an ec policy and the set it to the directory.
 - Disable the policy and check the state of the policy in -listPolicies
 -If the ec policy is in disable state ,system will not allow you to set it on 
any directory
 -Remove the ec policy and check the state of the policy in -listPolicies.
 Its just set the state as removed ,but the policy is still present in the list.
 -If the ec policy is in remove state,system will not allow you to set it on 
any directory
 - There is no difference between disable and remove state.
 -After adding 64 user-defined ec policies ,if an user wants to delete a policy 
which is not usable any more or not correctly added instead of that wants to 
add a new desired user-defined ec policy ,it can not be possible as no delete 
option is provided.Only remove policy option is given,which is not removing an 
user-defined policy,only set the policy state as removed.

Actual ouput :-
 
 No administrative command provided to delete an user defined erasure coding 
policy.With "-removePolicy" we can set a policy state as removed,we cann't 
delete the user-defined ec policy.After adding 64 user-defined ec policies,if a 
user wants to delete an policy and add a new desired policy,there is no 
administrative provision provided to perform this operation.
 
 Expected output :-
 
 Either "-removePolicy" should remove the user-defined ec policy ,instead of 
changing the policy state to removed only or administrative privilege should be 
provided to delete an user-defined ec policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-244) Synchronize PutKey and WriteChunk requests in Ratis Server

2018-08-17 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-244.
--
Resolution: Fixed

> Synchronize PutKey and WriteChunk requests in Ratis Server
> --
>
> Key: HDDS-244
> URL: https://issues.apache.org/jira/browse/HDDS-244
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.2.1
>
>
> In Ratis, all the WriteChunk Requests are submitted to Ratis with 
> Replication_Majority semantics. That means, the command execution from Ratis 
> completes any 2 of 3 datanodes complete execution of the request. It might 
> happen that on one of the follower, PutKey might start execution while all 
> the WriteChunks requests processing  for the same block are still in 
> progress. There needs to be a synchronization enforced between PutKey and 
> corresponding WriteChunk requests in the ContainerStateMachine. This Jira 
> aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11821) BlockManager.getMissingReplOneBlocksCount() does not report correct value if corrupt file with replication factor of 1 gets deleted

2018-08-17 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HDFS-11821.
-
Resolution: Duplicate

This had taken too much to be reviewed, so a newer Jira HDFS-13048 ended up 
fixing this. Thank you all for the pointers.

> BlockManager.getMissingReplOneBlocksCount() does not report correct value if 
> corrupt file with replication factor of 1 gets deleted
> ---
>
> Key: HDFS-11821
> URL: https://issues.apache.org/jira/browse/HDFS-11821
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.0, 3.0.0-alpha2
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-11821-1.patch, HDFS-11821-2.patch
>
>
> *BlockManager* keeps a separate metric for number of missing blocks with 
> replication factor of 1. This is returned by 
> *BlockManager.getMissingReplOneBlocksCount()* method currently, and that's 
> what is displayed on below attribute for *dfsadmin -report* (in below 
> example, there's one corrupt block that relates to a file with replication 
> factor of 1):
> {noformat}
> ...
> Missing blocks (with replication factor 1): 1
> ...
> {noformat}
> However, if the related file gets deleted, (for instance, using hdfs fsck 
> -delete option), this metric never gets updated, and *dfsadmin -report* will 
> keep reporting a missing block, even though the file does not exist anymore. 
> The only workaround available is to restart the NN, so that this metric will 
> be cleared.
> This can be easily reproduced by forcing a replication factor 1 file 
> corruption such as follows:
> 1) Put a file into hdfs with replication factor 1:
> {noformat}
> $ hdfs dfs -Ddfs.replication=1 -put test_corrupt /
> $ hdfs dfs -ls /
> -rw-r--r--   1 hdfs supergroup 19 2017-05-10 09:21 /test_corrupt
> {noformat}
> 2) Find related block for the file and delete it from DN:
> {noformat}
> $ hdfs fsck /test_corrupt -files -blocks -locations
> ...
> /test_corrupt 19 bytes, 1 block(s):  OK
> 0. BP-782213640-172.31.113.82-1494420317936:blk_1073742742_1918 len=19 
> Live_repl=1 
> [DatanodeInfoWithStorage[172.31.112.178:20002,DS-a0dc0b30-a323-4087-8c36-26ffdfe44f46,DISK]]
> Status: HEALTHY
> ...
> $ find /dfs/dn/ -name blk_1073742742*
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742_1918.meta
> $ rm -rf 
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742
> $ rm -rf 
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742_1918.meta
> {noformat}
> 3) Running fsck will report the corruption as expected:
> {noformat}
> $ hdfs fsck /test_corrupt -files -blocks -locations
> ...
> /test_corrupt 19 bytes, 1 block(s): 
> /test_corrupt: CORRUPT blockpool BP-782213640-172.31.113.82-1494420317936 
> block blk_1073742742
>  MISSING 1 blocks of total size 19 B
> ...
> Total blocks (validated): 1 (avg. block size 19 B)
>   
>   UNDER MIN REPL'D BLOCKS:1 (100.0 %)
>   dfs.namenode.replication.min:   1
>   CORRUPT FILES:  1
>   MISSING BLOCKS: 1
>   MISSING SIZE:   19 B
>   CORRUPT BLOCKS: 1
> ...
> {noformat}
> 4) Same for *dfsadmin -report*
> {noformat}
> $ hdfs dfsadmin -report
> ...
> Under replicated blocks: 1
> Blocks with corrupt replicas: 0
> Missing blocks: 1
> Missing blocks (with replication factor 1): 1
> ...
> {noformat}
> 5) Running *fsck -delete* option does cause fsck to report correct 
> information about corrupt block, but dfsadmin still shows the corrupt block:
> {noformat}
> $ hdfs fsck /test_corrupt -delete
> ...
> $ hdfs fsck /
> ...
> The filesystem under path '/' is HEALTHY
> ...
> $ hdfs dfsadmin -report
> ...
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 1
> ...
> {noformat}
> The problem seems to be on *BlockManager.removeBlock()* method, which in turn 
> uses util class *LowRedundancyBlocks* that classifies blocks according to the 
> current replication level, including blocks currently marked as corrupt. 
> The related metric showed on *dfsadmin -report* for corrupt blocks with 
> replication factor 1 is tracked on this *LowRedundancyBlocks*. Whenever a 
> block is marked as corrupt and it has replication factor of 1, the related 
> metric is updated. When removing the block, though, 
> *BlockManager.removeBlock()* is calling *LowRedundancyBlocks.remove(Bloc

Apache Hadoop qbt Report: trunk+JDK8 on Windows/x64

2018-08-17 Thread Apache Jenkins Server
For more details, see https://builds.apache.org/job/hadoop-trunk-win/561/

[Aug 16, 2018 10:50:43 PM] (aw) YETUS-662. integration test runner
[Aug 17, 2018 12:48:00 AM] (aw) YETUS-661. ant/gradle/maven assumes container 
$HOME and host $HOME are
[Aug 16, 2018 4:46:37 PM] (eyang) YARN-8474. Fixed ApiServiceClient kerberos 
negotiation.   
[Aug 16, 2018 4:58:46 PM] (stevel) HADOOP-15642. Update aws-sdk version to 
1.11.375. Contributed by Steve
[Aug 16, 2018 6:05:19 PM] (msingh) HDDS-179. CloseContainer/PutKey command 
should be syncronized with write
[Aug 16, 2018 10:00:45 PM] (templedf) HDFS-13746. Still occasional "Should be 
different group" failure in
[Aug 16, 2018 10:41:58 PM] (eyang) YARN-8667. Cleanup symlinks when container 
restarted by NM.   
[Aug 16, 2018 11:29:38 PM] (weichiu) HDFS-10240. Race between 
close/recoverLease leads to missing block.
[Aug 17, 2018 5:18:52 AM] (xyao) HDDS-119. Skip Apache license header check for 
some ozone doc scripts.
[Aug 17, 2018 5:42:03 AM] (xiao) HADOOP-15655. Enhance KMS client retry 
behavior. Contributed by Kitti
[Aug 17, 2018 5:42:10 AM] (rohithsharmaks) YARN-8612. Fix NM Collector Service 
Port issue in YarnConfiguration.
[Aug 17, 2018 6:14:21 AM] (xiao) HDFS-13747. Statistic for list_located_status 
is incremented incorrectly
[Aug 17, 2018 9:10:29 AM] (elek) HADOOP-8807. Update README and website to 
reflect HADOOP-8662.
[Aug 17, 2018 9:52:55 AM] (brahma) HDFS-13790. RBF: Move ClientProtocol APIs to 
its own module. Contributed


ERROR: File 'out/email-report.txt' does not exist

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDDS-363) Faster datanode registration during the first startup

2018-08-17 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-363:
-

 Summary: Faster datanode registration during the first startup
 Key: HDDS-363
 URL: https://issues.apache.org/jira/browse/HDDS-363
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Elek, Marton
 Fix For: 0.2.1


During the first startup usually we need to wait about 30 s to find the scm 
usable. The datanode registration is a multiple step process (request/response 
+ request/response) and we need to wait the next HB to finish the registration.

I propose to use a more higher HB frequency at startup (let's say 2 seconds) 
and set the configured HB only at the end of the registration.

It also helps for the first users as it could be less confusing (the datanode 
can be seen almost immediately on the UI)

Also it would help a lot for me during the testing (yes, I can decrease the HB 
frequency but in that case it's harder the follow the later HBs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2018-08-17 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/

[Aug 16, 2018 10:44:18 AM] (yqlin) HDFS-13829. Remove redundant condition 
judgement in
[Aug 16, 2018 3:06:17 PM] (jlowe) YARN-8656. container-executor should not 
write cgroup tasks files for
[Aug 16, 2018 4:46:37 PM] (eyang) YARN-8474. Fixed ApiServiceClient kerberos 
negotiation.   
[Aug 16, 2018 4:58:46 PM] (stevel) HADOOP-15642. Update aws-sdk version to 
1.11.375. Contributed by Steve
[Aug 16, 2018 6:05:19 PM] (msingh) HDDS-179. CloseContainer/PutKey command 
should be syncronized with write
[Aug 16, 2018 10:00:45 PM] (templedf) HDFS-13746. Still occasional "Should be 
different group" failure in
[Aug 16, 2018 10:41:58 PM] (eyang) YARN-8667. Cleanup symlinks when container 
restarted by NM.   
[Aug 16, 2018 11:29:38 PM] (weichiu) HDFS-10240. Race between 
close/recoverLease leads to missing block.




-1 overall


The following subsystems voted -1:
asflicense findbugs pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine
 
   Unread field:FSBasedSubmarineStorageImpl.java:[line 39] 
   Found reliance on default encoding in 
org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.generateCommandLaunchScript(RunJobParameters,
 TaskType, Component):in 
org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.generateCommandLaunchScript(RunJobParameters,
 TaskType, Component): new java.io.FileWriter(File) At 
YarnServiceJobSubmitter.java:[line 192] 
   
org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.generateCommandLaunchScript(RunJobParameters,
 TaskType, Component) may fail to clean up java.io.Writer on checked exception 
Obligation to clean up resource created at YarnServiceJobSubmitter.java:to 
clean up java.io.Writer on checked exception Obligation to clean up resource 
created at YarnServiceJobSubmitter.java:[line 192] is not discharged 
   
org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceUtils.getComponentArrayJson(String,
 int, String) concatenates strings using + in a loop At 
YarnServiceUtils.java:using + in a loop At YarnServiceUtils.java:[line 72] 

Failed CTEST tests :

   test_test_libhdfs_threaded_hdfs_static 
   test_libhdfs_threaded_hdfspp_test_shim_static 

Failed junit tests :

   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart 
   hadoop.mapred.TestMRTimelineEventHandling 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/diff-compile-javac-root.txt
  [328K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/diff-checkstyle-root.txt
  [17M]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/diff-patch-shelldocs.txt
  [16K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/whitespace-eol.txt
  [9.4M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/whitespace-tabs.txt
  [1.1M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/xml.txt
  [4.0K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/branch-findbugs-hadoop-hdds_client.txt
  [68K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/branch-findbugs-hadoop-hdds_container-service.txt
  [60K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/branch-findbugs-hadoop-hdds_framework.txt
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/871/artifact/out/branch-findbugs-hadoop-hdds_server-scm.txt
  [60K]
   
https://builds.apache.or

[jira] [Created] (HDFS-13833) Failed to choose from local rack (location = /default); the second replica is not found, retry choosing ramdomly

2018-08-17 Thread Henrique Barros (JIRA)
Henrique Barros created HDFS-13833:
--

 Summary: Failed to choose from local rack (location = /default); 
the second replica is not found, retry choosing ramdomly
 Key: HDFS-13833
 URL: https://issues.apache.org/jira/browse/HDFS-13833
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Henrique Barros


I'm having a random problem with blocks replication with Hadoop 2.6.0-cdh5.15.0
With Cloudera CDH-5.15.0-1.cdh5.15.0.p0.21

 

In my case we are getting this error very randomly (after some hours) and with 
only one Datanode (for now, we are trying this cloudera cluster for a POC)
Here is the Log.
{code:java}
Choosing random from 1 available nodes on node /default, scope=/default, 
excludedScope=null, excludeNodes=[]
2:38:20.527 PM  DEBUG   NetworkTopology 
Choosing random from 0 available nodes on node /default, scope=/default, 
excludedScope=null, excludeNodes=[192.168.220.53:50010]
2:38:20.527 PM  DEBUG   NetworkTopology 
chooseRandom returning null
2:38:20.527 PM  DEBUG   BlockPlacementPolicy
[
Node /default/192.168.220.53:50010 [
  Datanode 192.168.220.53:50010 is not chosen since the node is too busy (load: 
8 > 0.0).
2:38:20.527 PM  DEBUG   NetworkTopology 
chooseRandom returning 192.168.220.53:50010
2:38:20.527 PM  INFOBlockPlacementPolicy
Not enough replicas was chosen. Reason:{NODE_TOO_BUSY=1}
2:38:20.527 PM  DEBUG   StateChange 
closeFile: 
/mobi.me/development/apps/flink/checkpoints/a5a6806866c1640660924ea1453cbe34/chk-2118/eef8bff6-75a9-43c1-ae93-4b1a9ca31ad9
 with 1 blocks is persisted to the file system
2:38:20.527 PM  DEBUG   StateChange 
*BLOCK* NameNode.addBlock: file 
/mobi.me/development/apps/flink/checkpoints/a5a6806866c1640660924ea1453cbe34/chk-2118/1cfe900d-6f45-4b55-baaa-73c02ace2660
 fileId=129628869 for DFSClient_NONMAPREDUCE_467616914_65
2:38:20.527 PM  DEBUG   BlockPlacementPolicy
Failed to choose from local rack (location = /default); the second replica is 
not found, retry choosing ramdomly
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
 
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:784)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:694)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:601)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:561)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:464)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:395)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:270)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:142)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:158)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1715)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3505)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:694)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:219)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:507)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)

{code}
This part makes no sense at all:


{code:java}
load: 8 > 0.0{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---

[jira] [Created] (HDDS-364) Update open container replica information in SCM during DN register

2018-08-17 Thread Ajay Kumar (JIRA)
Ajay Kumar created HDDS-364:
---

 Summary: Update open container replica information in SCM during 
DN register
 Key: HDDS-364
 URL: https://issues.apache.org/jira/browse/HDDS-364
 Project: Hadoop Distributed Data Store
  Issue Type: New Feature
Reporter: Ajay Kumar


Update open container replica information in SCM during DN register.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13834) RBF: Connection creator thread should catch Throwable

2018-08-17 Thread CR Hota (JIRA)
CR Hota created HDFS-13834:
--

 Summary: RBF: Connection creator thread should catch Throwable
 Key: HDFS-13834
 URL: https://issues.apache.org/jira/browse/HDFS-13834
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: 
{code:java}
@Override
public void run() {
  while (this.running) {
try {
  ConnectionPool pool = this.queue.take();
  try {
int total = pool.getNumConnections();
int active = pool.getNumActiveConnections();
if (pool.getNumConnections() < pool.getMaxSize() &&
active >= MIN_ACTIVE_RATIO * total) {
  ConnectionContext conn = pool.newConnection();
  pool.addConnection(conn);
} else {
  LOG.debug("Cannot add more than {} connections to {}",
  pool.getMaxSize(), pool);
}
  } catch (IOException e) {
LOG.error("Cannot create a new connection", e);
  }
} catch (InterruptedException e) {
  LOG.error("The connection creator was interrupted");
  this.running = false;
}
  }
{code}

Reporter: CR Hota
Assignee: CR Hota


Connection creator thread is a single thread thats responsible for creating all 
downstream namenode connections.

This is very critical thread and hence should not die understand 
exception/error scenarios.

We saw this behavior in production systems where the thread died leaving the 
router process in bad state.

The thread should also catch a generic error/exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org