[jira] [Reopened] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM

2017-10-19 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin reopened HDFS-11468:
--

> Ozone: SCM: Add Node Metrics for SCM
> 
>
> Key: HDFS-11468
> URL: https://issues.apache.org/jira/browse/HDFS-11468
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Xiaoyu Yao
>Assignee: Yiqun Lin
>Priority: Critical
>  Labels: OzonePostMerge
> Attachments: HDFS-11468-HDFS-7240.001.patch, 
> HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, 
> HDFS-11468-HDFS-7240.004.patch
>
>
> This ticket is opened to add node metrics in SCM based on heartbeat, node 
> report and container report from datanodes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-19 Thread Junping Du
Hi folks,
 I've created our new release candidate (RC1) for Apache Hadoop 2.8.2.

 Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line and 
will be the latest stable/production release for Apache Hadoop - it includes 
315 new fixed issues since 2.8.1 and 69 fixes are marked as blocker/critical 
issues.

  More information about the 2.8.2 release plan can be found here: 
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release

  New RC is available at: 
http://home.apache.org/~junping_du/hadoop-2.8.2-RC1

  The RC tag in git is: release-2.8.2-RC1, and the latest commit id is: 
66c47f2a01ad9637879e95f80c41f798373828fb

  The maven artifacts are available via 
repository.apache.org at: 
https://repository.apache.org/content/repositories/orgapachehadoop-1064

  Please try the release and vote; the vote will run for the usual 5 days, 
ending on 10/24/2017 6pm PST time.

Thanks,

Junping



Re: Use HAAdmin API

2017-10-19 Thread Arpit Agarwal
Mihir,

HAAdmin is a private interface. Most of its functionality is exposed via the 
‘hdfs haadmin’ command [1]. Will that work for you?

1. 
https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#haadmin




On 10/17/17, 4:28 AM, "Mihir Monani"  wrote:

I wanted to write Chaos Action for HBase Chaos Monkey (something like
RestartActiveMaster


 )  which can trigger NN Failover.

For that i was going through HAAdmin.java. Is there any way I can use
function like HAAdmin#failover , HAAdmin#getServiceState from HAAdmin
class.

Can someone guide me how do i use them?

Thanks,
Mihir Monani



-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Permissions to edit Confluence Wiki

2017-10-19 Thread Ajay Kumar
Hi Team,

May I request access to https://cwiki.apache.org/confluence/display/HADOOP 
I want to add a wiki page for error information regarding datanode start in 
secure mode. (HADOOP-14969)

Thanks,
Ajay 

On 9/8/17, 7:53 PM, "Arun Suresh"  wrote:

Thanks Akira

Cheers
-Arun

On Fri, Sep 8, 2017 at 6:53 PM, Akira Ajisaka  wrote:

> You can get the privilege by sending an e-mail to the dev ML.
> I added it for you.
>
> Thanks,
> Akira
>
>
> On 2017/09/09 4:50, Arun Suresh wrote:
>
>> Hi folks
>>
>> How do we get access to edit the confluence wiki;
>> https://cwiki.apache.org/confluence/display/HADOOP ?
>>
>> We were hoping to update it with hadoop 2.9 release details.
>>
>> Cheers
>> -Arun
>>
>>




Re: [Update] Apache Hadoop 2.8.2 Release Status

2017-10-19 Thread Junping Du
A quick update: the last patch (YARN-7230) for docker container support in 2.8 
just get committed yesterday. Now there is no left blocker/critical issues for 
2.8.2 and I checked all landed commits are matching with JIRA's fix version. 
With kicking off a new RC build, I will publish RC bits for vote once the build 
process get finished. In the mean time, please hold on any commits to 
branch-2.8.2 unless it really belongs to a blocker and please ping me ahead. 

Thanks all for your patience!

Thanks,

Junping

From: Junping Du 
Sent: Friday, September 22, 2017 5:57 PM
To: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Cc: Shane Kumpf; Miklos Szegedi; Varun Vasudev
Subject: [Update] Apache Hadoop 2.8.2 Release Status

Hi folks,
 I would like to give you a quick update on 2.8.2 release status:

- First release candidate (RC0) is published over the last weekend, but several 
docker container blockers (bugs, documents, etc.)
 are reported so we decided to cancel the RC0 for vote.

- New coming release blockers (for docker container support) are YARN-7034 
(just committed), YARN-6623, YARN-6930 and YARN-7230.
Shane, Miklos and Varun are actively working on this. Appreciate the effort 
here!

- I will kick off new release candidate (RC1) once these blockers are resolved.

To all committers, branch-2.8.2 is still open for blocker/critical issues 
landing, but for major/minor/trivial issues, please commit to branch-2.8 and 
marked the fixed version as 2.8.3.

Thanks all for heads up. Have a good weekend!


Thanks,

Junping


From: Junping Du 
Sent: Tuesday, September 5, 2017 2:57 PM
To: larry mccay; Steve Loughran
Cc: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Re: Apache Hadoop 2.8.2 Release Plan

I assume the quiet over the holiday means we agreed to move forward without 
taking HADOOP-14439 into 2.8.2.
There is a new release building (docker based) issue could be related to 
HADOOP-14474 where we removed oracle java 7 installer due to recent download 
address/contract change by Oracle. The build refuse to work - report as 
JAVA_HOME issue, but hard coded my local java home in create-release or 
Dockerfile doesn't help so we may need to add java 7 installation back (no 
matter Oracle JDK 7 or openJDK 7).
Filed HADOOP-14842 with more details to track as blocker for 2.8.2.

Thanks,

Junping

From: Junping Du 
Sent: Friday, September 1, 2017 12:37 PM
To: larry mccay; Steve Loughran
Cc: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Re: Apache Hadoop 2.8.2 Release Plan

This issue (HADOOP-14439) is out of my radar given it is marked as Minor 
priority. If my understanding is correct, here is a trade-off between security 
and backward compatibility. IMO, priority of security is generally higher than 
backward compatibility especially 2.8.0 is still non-production release.
I think we should skip this for 2.8.2 in case it doesn't break compatibility 
from 2.7.x. Thoughts?

Thanks,

Junping

From: larry mccay 
Sent: Friday, September 1, 2017 10:55 AM
To: Steve Loughran
Cc: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Re: Apache Hadoop 2.8.2 Release Plan

If we do "fix" this in 2.8.2 we should seriously consider not doing so in
3.0.
This is a very poor practice.

I can see an argument for backward compatibility in 2.8.x line though.

On Fri, Sep 1, 2017 at 1:41 PM, Steve Loughran 
wrote:

> One thing we need to consider is
>
> HADOOP-14439: regression: secret stripping from S3x URIs breaks some
> downstream code
>
> Hadoop 2.8 has a best-effort attempt to strip out secrets from the
> toString() value of an s3a or s3n path where someone has embedded them in
> the URI; this has caused problems in some uses, specifically: when people
> use secrets this way (bad) and assume that you can round trip paths to
> string and back
>
> Should we fix this? If so, Hadoop 2.8.2 is the time to do it
>
>
> > On 1 Sep 2017, at 11:14, Junping Du  wrote:
> >
> > HADOOP-14814 get committed and HADOOP-9747 get push out to 2.8.3, so we
> are clean on blocker/critical issues now.
> > I finish practice of going through JACC report and no more incompatible
> public API changes get found between 2.8.2 and 2.7.4. Also I check commit
> history and fixed 10+ commits which are missing from branch-2.8.2 for some
> reason. So, the current branch-2.8.2 should be good to go for RC stage, and
> I will kick off our first RC tomorrow.
> > In the 

[jira] [Created] (HDFS-12689) Ozone: SCM: Clean up of container in DELETING state

2017-10-19 Thread Nanda kumar (JIRA)
Nanda kumar created HDFS-12689:
--

 Summary: Ozone: SCM: Clean up of container in DELETING state
 Key: HDFS-12689
 URL: https://issues.apache.org/jira/browse/HDFS-12689
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Nanda kumar


When creating container times out, the container is moved to {{DELETING}} 
state. Once the container is in DELETING state {{ContainerStateManager}} should 
do cleanup and delete the containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-9808) Combine READ_ONLY_SHARED DatanodeStorages with the same ID

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs resolved HDFS-9808.
--
Resolution: Won't Fix

This was part of HDFS-11190.

> Combine READ_ONLY_SHARED DatanodeStorages with the same ID
> --
>
> Key: HDFS-9808
> URL: https://issues.apache.org/jira/browse/HDFS-9808
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chris Douglas
>
> In HDFS-5318, each datanode that can reach a (read only) block reports itself 
> as a valid location for the block. While accurate, this increases (redundant) 
> block report traffic and- without partitioning on the backend- may return an 
> overwhelming number of replica locations for each block.
> Instead, a DN could report only that the shared storage is reachable. The 
> contents of the storage could be reported separately/synthetically to the 
> block manager, which can collapse all instances into a single storage. A 
> subset of locations- closest to the client, etc.- can be returned, rather 
> than all possible locations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12688) HDFS File Not Removed Despite Successful "Moved to .Trash" Message

2017-10-19 Thread Shriya Gupta (JIRA)
Shriya Gupta created HDFS-12688:
---

 Summary: HDFS File Not Removed Despite Successful "Moved to 
.Trash" Message
 Key: HDFS-12688
 URL: https://issues.apache.org/jira/browse/HDFS-12688
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.6.0
Reporter: Shriya Gupta
Priority: Critical


Wrote a simple script to delete and create a file and ran it multiple times. 
However, some executions of the script randomly threw a FileAlreadyExists error 
while the others succeeded despite successful hdfs dfs -rm command. The script 
is as below, I have reproduced it in two different environments -- 

hdfs dfs -ls  /user/shriya/shell_test/
echo "starting hdfs remove **" 
hdfs dfs -rm -r -f /user/shriya/shell_test/wordcountOutput
 echo "hdfs compeleted!"
hdfs dfs -ls  /user/shriya/shell_test/
echo "starting mapReduce***"
mapred job -libjars 
/data/home/shriya/shell_test/hadoop-mapreduce-client-jobclient-2.7.1.jar 
-submit /data/home/shriya/shell_test/wordcountJob.xml

The message confirming successful move -- 

17/10/19 14:49:12 INFO fs.TrashPolicyDefault: Moved: 
'hdfs://nameservice1/user/shriya/shell_test/wordcountOutput' to trash at: 
hdfs://nameservice1/user/shriya/.Trash/Current/user/shriya/shell_test/wordcountOutput1508438952728

The contents of subsequent -ls after -rm also showed that the file still 
existed)

The error I got when my MapReduce job tried to create the file -- 

17/10/19 14:50:00 WARN security.UserGroupInformation: 
PriviledgedActionException as: (auth:KERBEROS) 
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: 
Output directory hdfs://nameservice1/user/shriya/shell_test/wordcountOutput 
already exists
at 
org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
at 
org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:272)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Do we still have nightly (or even weekly) unit test run for Hadoop projects?

2017-10-19 Thread Wangda Tan
Gotcha, thanks!

- Wangda

On Thu, Oct 19, 2017 at 7:25 AM, Sean Busbey  wrote:

> Here's the email from last night to common-dev@hadoop:
>
> https://s.apache.org/ARe1
>
> On Wed, Oct 18, 2017 at 10:42 PM, Akira Ajisaka 
> wrote:
>
>> Yes, qbt runs nightly and it sends e-mail to dev lists.
>> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/
>>
>> Regards,
>> Akira
>>
>>
>> On 2017/10/19 7:54, Wangda Tan wrote:
>>
>>> Hi,
>>>
>>> Do we still have nightly (or even weekly) unit test run for Hadoop
>>> projects? I couldn't find it on Jenkins dashboard and I haven't seen
>>> reports set to dev lists for a while.
>>>
>>> Thanks,
>>> Wangda
>>>
>>>
>> -
>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>>
>>
>
>
> --
> busbey
>


Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-10-19 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/

[Oct 18, 2017 10:06:30 PM] (junping_du) HADOOP-14958. Fix source-level 
compatibility after HADOOP-11252.




-1 overall


The following subsystems voted -1:
unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.net.TestDNS 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting 
   hadoop.hdfs.TestReadStripedFileWithMissingBlocks 
   hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler 

Timed out junit tests :

   
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/diff-compile-javac-root.txt
  [284K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/whitespace-eol.txt
  [11M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/diff-javadoc-javadoc-root.txt
  [1.9M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [148K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [380K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [40K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/562/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [64K]

Powered by Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12687) Client has recovered DN will not be removed from the “filed”

2017-10-19 Thread xuzq (JIRA)
xuzq created HDFS-12687:
---

 Summary: Client has recovered DN will not be removed from the 
“filed”
 Key: HDFS-12687
 URL: https://issues.apache.org/jira/browse/HDFS-12687
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.1
Reporter: xuzq


When client writing pipeline, such as Client=>DN1=>DN2=DN3.At one point, DN2 
crashed, client will execute the recovery process. The error DN2 will be added 
into "failed". Client will apply a new DN from NN with "failed" and replace the 
DN2 in the pipeline, eg: Client=>DN1=>DN4=>DN3. 
This Client running
After a long time, client is still writing data for the file. Of course, there 
are many pipelines. eg. Client => DN-1 => DN-2 => DN-3.
When DN-2 crashed, error DN-2 will be added into "failed", client will execute 
the recovery process as before. It will get a new DN from NN with the "failed", 
and {color:red}NN will select one DN from all DNs exclude "failed", even if 
DN-2 has restarted{color}.

Questions:
Why not remove DN2(started) from "failed"??
Why is the collection of error nodes in the recovery process Shared with the 
get next Block.such as
private final List failed = new ArrayList<>();
private final LoadingCache excludedNodes;

As Before, when DN2 crashed, client will recovery the pipeline after 
timeout(default worst need 490s). When the client finished writing this block 
and apply the next block, NN maybe return the block which contains the error 
data node 'DN2'. When client will create a new pipeline for the new block, 
{color:red}client will has to go through a connection timeout{color}(default 
need 60s).

If "failed" and "excludedNodes" is one collection, it will avoid the connection 
timeout. Because "excludedNodes" is dynamically deleted, it also avoid the 
first problem.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Do we still have nightly (or even weekly) unit test run for Hadoop projects?

2017-10-19 Thread Sean Busbey
Here's the email from last night to common-dev@hadoop:

https://s.apache.org/ARe1

On Wed, Oct 18, 2017 at 10:42 PM, Akira Ajisaka  wrote:

> Yes, qbt runs nightly and it sends e-mail to dev lists.
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/
>
> Regards,
> Akira
>
>
> On 2017/10/19 7:54, Wangda Tan wrote:
>
>> Hi,
>>
>> Do we still have nightly (or even weekly) unit test run for Hadoop
>> projects? I couldn't find it on Jenkins dashboard and I haven't seen
>> reports set to dev lists for a while.
>>
>> Thanks,
>> Wangda
>>
>>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


-- 
busbey


[jira] [Created] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart

2017-10-19 Thread SammiChen (JIRA)
SammiChen created HDFS-12686:


 Summary: Erasure coding system policy state is not correctly saved 
and loaded during real cluster restart
 Key: HDFS-12686
 URL: https://issues.apache.org/jira/browse/HDFS-12686
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0-beta1
Reporter: SammiChen
Assignee: SammiChen


Inspired by HDFS-12682,  I found the system erasure coding policy state will  
not  be correctly saved and loaded in a real cluster.  Through there are such 
kind of unit tests and all are passed with MiniCluster. It's because the 
MiniCluster keeps the same static system erasure coding policy object after the 
NN restart operation. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12685) FsVolumeImpl exception when scanning Provided storage volume

2017-10-19 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-12685:
-

 Summary: FsVolumeImpl exception when scanning Provided storage 
volume
 Key: HDFS-12685
 URL: https://issues.apache.org/jira/browse/HDFS-12685
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ewan Higgs


I left a Datanode running overnight and found this in the logs in the morning:

{code}
2017-10-18 23:51:54,391 ERROR datanode.DirectoryScanner: Error compiling report 
for the volume, StorageId: DS-e75ebc3c-6b12-424e-875a-a4ae1a4dcc29  
  
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
URI scheme is not "file"

 
at java.util.concurrent.FutureTask.report(FutureTask.java:122)  

  
at java.util.concurrent.FutureTask.get(FutureTask.java:192) 

  
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544)

   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393)


at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)

   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)

 
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 

   
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 

  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 

   
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 

   
at java.lang.Thread.run(Thread.java:748)

  
Caused by: java.lang.IllegalArgumentException: URI scheme is not "file" 

  
at java.io.File.(File.java:421)   

  
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$ScanInfo.(FsVolumeSpi.java:319)

 
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl$ProvidedBlockPoolSlice.compileReport(ProvidedVolumeImpl.java:155)
  

[jira] [Created] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics

2017-10-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12684:
--

 Summary: Ozone: SCM metrics NodeCount is overlapping with node 
manager metrics
 Key: HDFS-12684
 URL: https://issues.apache.org/jira/browse/HDFS-12684
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Reporter: Weiwei Yang
Priority: Minor


I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
both SCM and SCMNodeManager has {{NodeCount}} metrics

{noformat}
 {
"name" : 
"Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
"modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
"ClientRpcPort" : "9860",
"DatanodeRpcPort" : "9861",
"NodeCount" : [ {
  "key" : "STALE",
  "value" : 0
}, {
  "key" : "DECOMMISSIONING",
  "value" : 0
}, {
  "key" : "DECOMMISSIONED",
  "value" : 0
}, {
  "key" : "FREE_NODE",
  "value" : 0
}, {
  "key" : "RAFT_MEMBER",
  "value" : 0
}, {
  "key" : "HEALTHY",
  "value" : 0
}, {
  "key" : "DEAD",
  "value" : 0
}, {
  "key" : "UNKNOWN",
  "value" : 0
} ],
"CompileInfo" : "2017-10-17T06:47Z xxx",
"Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
"SoftwareVersion" : "3.1.0-SNAPSHOT",
"StartedTimeInMillis" : 1508393551065
  }, {
"name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
"modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
"NodeCount" : [ {
  "key" : "STALE",
  "value" : 0
}, {
  "key" : "DECOMMISSIONING",
  "value" : 0
}, {
  "key" : "DECOMMISSIONED",
  "value" : 0
}, {
  "key" : "FREE_NODE",
  "value" : 0
}, {
  "key" : "RAFT_MEMBER",
  "value" : 0
}, {
  "key" : "HEALTHY",
  "value" : 0
}, {
  "key" : "DEAD",
  "value" : 0
}, {
  "key" : "UNKNOWN",
  "value" : 0
} ],
"OutOfChillMode" : false,
"MinimumChillModeNodes" : 1,
"ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 0 
nodes reported, minimal 1 nodes required."
  }
{noformat}

hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org