[jira] [Reopened] (HDFS-12748) NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY

2019-07-08 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened HDFS-12748:


> NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY
> 
>
> Key: HDFS-12748
> URL: https://issues.apache.org/jira/browse/HDFS-12748
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Jiandan Yang 
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-12748-branch-3.1.01.patch, HDFS-12748.001.patch, 
> HDFS-12748.002.patch, HDFS-12748.003.patch, HDFS-12748.004.patch, 
> HDFS-12748.005.patch
>
>
> In our production environment, the standby NN often do fullgc, through mat we 
> found the largest object is FileSystem$Cache, which contains 7,844,890 
> DistributedFileSystem.
> By view hierarchy of method FileSystem.get() , I found only 
> NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating 
> different DistributedFileSystem every time instead of get a FileSystem from 
> cache.
> {code:java}
> case GETHOMEDIRECTORY: {
>   final String js = JsonUtil.toJsonString("Path",
>   FileSystem.get(conf != null ? conf : new Configuration())
>   .getHomeDirectory().toUri().getPath());
>   return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
> }
> {code}
> When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc.
> {code:java}
> case GETHOMEDIRECTORY: {
>   FileSystem fs = null;
>   try {
> fs = FileSystem.get(conf != null ? conf : new Configuration());
> final String js = JsonUtil.toJsonString("Path",
> fs.getHomeDirectory().toUri().getPath());
> return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
>   } finally {
> if (fs != null) {
>   fs.close();
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13351) Revert HDFS-11156 from branch-2

2018-03-26 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-13351:
--

 Summary: Revert HDFS-11156 from branch-2
 Key: HDFS-13351
 URL: https://issues.apache.org/jira/browse/HDFS-13351
 Project: Hadoop HDFS
  Issue Type: Task
  Components: webhdfs
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Per discussion in HDFS-11156, lets revert the change from branch-2. New patch 
can be tracked in HDFS-12459 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12936) java.lang.OutOfMemoryError: unable to create new native thread

2017-12-18 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12936.

Resolution: Not A Bug

> java.lang.OutOfMemoryError: unable to create new native thread
> --
>
> Key: HDFS-12936
> URL: https://issues.apache.org/jira/browse/HDFS-12936
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
> Environment: CDH5.12
> hadoop2.6
>Reporter: Jepson
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> I configure the max user processes  65535 with any user ,and the datanode 
> memory is 8G.
> When a log of data was been writeen,the datanode was been shutdown.
> But I can see the memory use only < 1000M.
> Please to see https://pan.baidu.com/s/1o7BE0cy
> *DataNode shutdown error log:*  
> {code:java}
> 2017-12-17 23:58:14,422 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder: 
> BP-1437036909-192.168.17.36-1509097205664:blk_1074725940_987917, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2017-12-17 23:58:31,425 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. 
> Will retry in 30 seconds.
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:714)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-12-17 23:59:01,426 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. 
> Will retry in 30 seconds.
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:714)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-12-17 23:59:05,520 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. 
> Will retry in 30 seconds.
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:714)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-12-17 23:59:31,429 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving BP-1437036909-192.168.17.36-1509097205664:blk_1074725951_987928 
> src: /192.168.17.54:40478 dest: /192.168.17.48:50010
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12770) Add doc about how to disable client socket cache

2017-11-03 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12770:
--

 Summary: Add doc about how to disable client socket cache
 Key: HDFS-12770
 URL: https://issues.apache.org/jira/browse/HDFS-12770
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


After HDFS-3365, client socket cache (PeerCache) can be disabled, but there is 
no doc about this. We should add some doc in hdfs-default.xml to instruct user 
how to disable it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12757) DeadLock Happened Between DFSOutputStream and LeaseRenewer when LeaseRenewer#renew SocketTimeException

2017-11-02 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12757.

Resolution: Duplicate

> DeadLock Happened Between DFSOutputStream and LeaseRenewer when 
> LeaseRenewer#renew SocketTimeException
> --
>
> Key: HDFS-12757
> URL: https://issues.apache.org/jira/browse/HDFS-12757
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Jiandan Yang 
>Priority: Major
> Attachments: HDFS-12757.patch
>
>
> Java stack is :
> {code:java}
> Found one Java-level deadlock:
> =
> "Topology-2 (735/2000)":
>   waiting to lock monitor 0x7fff4523e6e8 (object 0x0005d3521078, a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer),
>   which is held by "LeaseRenewer:admin@na61storage"
> "LeaseRenewer:admin@na61storage":
>   waiting to lock monitor 0x7fff5d41e838 (object 0x0005ec0dfa88, a 
> org.apache.hadoop.hdfs.DFSOutputStream),
>   which is held by "Topology-2 (735/2000)"
> Java stack information for the threads listed above:
> ===
> "Topology-2 (735/2000)":
> at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:227)
> - waiting to lock <0x0005d3521078> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
> at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:86)
> at 
> org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:467)
> at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:479)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.setClosed(DFSOutputStream.java:776)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeThreads(DFSOutputStream.java:791)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:848)
> - locked <0x0005ec0dfa88> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:805)
> - locked <0x0005ec0dfa88> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
> ..
> "LeaseRenewer:admin@na61storage":
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:750)
> - waiting to lock <0x0005ec0dfa88> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
> at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:586)
> at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:453)
> - locked <0x0005d3521078> (a 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer)
> at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:76)
> at 
> org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:310)
> at java.lang.Thread.run(Thread.java:834)
> Found 1 deadlock.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12744) More logs when short-circuit read is failed and disabled

2017-10-29 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12744:
--

 Summary: More logs when short-circuit read is failed and disabled
 Key: HDFS-12744
 URL: https://issues.apache.org/jira/browse/HDFS-12744
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Short-circuit read (SCR) failed with following error

{noformat}
2017-10-21 16:42:28,024 WARN  [B.defaultRpcServer.handler=7,queue=7,port=16020] 
impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR
while attempting to set up short-circuit access. Block xxx is not valid
{noformat}

then short-circuit read is disabled for *10 minutes* without any warning 
message given in the log. This causes us spent some more time to figure out why 
we had a long time window that SCR was not working. Propose to add a warning 
log (other places already did) to indicate SCR is disabled and some more 
logging in DN to display what happened.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12701) More fine-grained locks in ShortCircuitCache

2017-10-24 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12701:
--

 Summary: More fine-grained locks in ShortCircuitCache
 Key: HDFS-12701
 URL: https://issues.apache.org/jira/browse/HDFS-12701
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.8.1
Reporter: Weiwei Yang


When cluster is heavily loaded, we found HBase regionserver handlers are often 
blocked by {{ShortCircuitCache}}. Dumped jstack and found more lots of thread 
waiting on obtain the cache lock. It should be able to be improved by using 
more fine-grained locks to improve the performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics

2017-10-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12684:
--

 Summary: Ozone: SCM metrics NodeCount is overlapping with node 
manager metrics
 Key: HDFS-12684
 URL: https://issues.apache.org/jira/browse/HDFS-12684
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Reporter: Weiwei Yang
Priority: Minor


I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
both SCM and SCMNodeManager has {{NodeCount}} metrics

{noformat}
 {
"name" : 
"Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
"modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
"ClientRpcPort" : "9860",
"DatanodeRpcPort" : "9861",
"NodeCount" : [ {
  "key" : "STALE",
  "value" : 0
}, {
  "key" : "DECOMMISSIONING",
  "value" : 0
}, {
  "key" : "DECOMMISSIONED",
  "value" : 0
}, {
  "key" : "FREE_NODE",
  "value" : 0
}, {
  "key" : "RAFT_MEMBER",
  "value" : 0
}, {
  "key" : "HEALTHY",
  "value" : 0
}, {
  "key" : "DEAD",
  "value" : 0
}, {
  "key" : "UNKNOWN",
  "value" : 0
} ],
"CompileInfo" : "2017-10-17T06:47Z xxx",
"Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
"SoftwareVersion" : "3.1.0-SNAPSHOT",
"StartedTimeInMillis" : 1508393551065
  }, {
"name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
"modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
"NodeCount" : [ {
  "key" : "STALE",
  "value" : 0
}, {
  "key" : "DECOMMISSIONING",
  "value" : 0
}, {
  "key" : "DECOMMISSIONED",
  "value" : 0
}, {
  "key" : "FREE_NODE",
  "value" : 0
}, {
  "key" : "RAFT_MEMBER",
  "value" : 0
}, {
  "key" : "HEALTHY",
  "value" : 0
}, {
  "key" : "DEAD",
  "value" : 0
}, {
  "key" : "UNKNOWN",
  "value" : 0
} ],
"OutOfChillMode" : false,
"MinimumChillModeNodes" : 1,
"ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 0 
nodes reported, minimal 1 nodes required."
  }
{noformat}

hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12401) Ozone: TestBlockDeletingService#testBlockDeletionTimeout sometimes timeout

2017-10-03 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12401.

Resolution: Cannot Reproduce

> Ozone: TestBlockDeletingService#testBlockDeletionTimeout sometimes timeout
> --
>
> Key: HDFS-12401
> URL: https://issues.apache.org/jira/browse/HDFS-12401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS-7240
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Weiwei Yang
>Priority: Minor
>
> {code}
> testBlockDeletionTimeout(org.apache.hadoop.ozone.container.common.TestBlockDeletingService)
>   Time elapsed: 100.383 sec  <<< ERROR!
> java.util.concurrent.TimeoutException: Timed out waiting for condition. 
> Thread diagnostics:
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12546) Ozone: DB listing operation performance improvement

2017-09-25 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12546:
--

 Summary: Ozone: DB listing operation performance improvement
 Key: HDFS-12546
 URL: https://issues.apache.org/jira/browse/HDFS-12546
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


While investigating HDFS-12506, I found there are several {{getRangeKVs}} can 
be replaced by {{getSequentialRangeKVs}} to improve the performance. This JIRA 
is to track these improvements with sufficient tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12540) Ozone: node status text reported by SCM is a confusing

2017-09-24 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12540:
--

 Summary: Ozone: node status text reported by SCM is a confusing
 Key: HDFS-12540
 URL: https://issues.apache.org/jira/browse/HDFS-12540
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Priority: Trivial


At present SCM UI displays node status like following

{noformat}
Node Manager: Chill mode status: Out of chill mode. 15 of out of total 1 nodes 
have reported in.
{noformat}

this text is a bit confusing. UI retrieves status from 
{{SCMNodeManager#getNodeStatus}}, related call is {{#getChillModeStatus}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12539) Ozone: refactor some functions in KSMMetadataManagerImpl to be more readable and reusable

2017-09-24 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12539:
--

 Summary: Ozone: refactor some functions in KSMMetadataManagerImpl 
to be more readable and reusable
 Key: HDFS-12539
 URL: https://issues.apache.org/jira/browse/HDFS-12539
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Priority: Minor


This is from [~anu]'s review comment in HDFS-12506, 
[https://issues.apache.org/jira/browse/HDFS-12506?focusedCommentId=16178356=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16178356].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-12415) Ozone: TestXceiverClientManager and TestAllocateContainer occasionally fails

2017-09-23 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened HDFS-12415:


> Ozone: TestXceiverClientManager and TestAllocateContainer occasionally fails
> 
>
> Key: HDFS-12415
> URL: https://issues.apache.org/jira/browse/HDFS-12415
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12415-HDFS-7240.001.patch, 
> HDFS-12415-HDFS-7240.002.patch, HDFS-12415-HDFS-7240.003.patch
>
>
> TestXceiverClientManager seems to be occasionally failing in some jenkins 
> jobs,
> {noformat}
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.ozone.scm.node.SCMNodeManager.getNodeStat(SCMNodeManager.java:828)
>  at 
> org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.hasEnoughSpace(SCMCommonPolicy.java:147)
>  at 
> org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.lambda$chooseDatanodes$0(SCMCommonPolicy.java:125)
> {noformat}
> see more from [this 
> report|https://builds.apache.org/job/PreCommit-HDFS-Build/21065/testReport/]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12524) Ozone: Record number of keys scanned and hinted for getRangeKVs call

2017-09-21 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12524:
--

 Summary: Ozone: Record number of keys scanned and hinted for 
getRangeKVs call
 Key: HDFS-12524
 URL: https://issues.apache.org/jira/browse/HDFS-12524
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: logging, ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


Add debug logging to record number of keys scanned and hinted for 
{{getRangeKVs}} calls, this will be helpful to debug performance issues since 
{{getRangeKVs}} is often the place where we get the lag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12506) Ozone: ListBucket is too slow

2017-09-20 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12506:
--

 Summary: Ozone: ListBucket is too slow
 Key: HDFS-12506
 URL: https://issues.apache.org/jira/browse/HDFS-12506
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Blocker


Generated 3 million keys in ozone, and run {{listBucket}} command to get a list 
of buckets under a volume,

{code}
bin/hdfs oz -listBucket http://15oz1.fyre.ibm.com:9864/vol-0-15143 -user wwei
{code}

this call spent over *15 seconds* to finish. The problem was caused by the 
inflexible structure of KSM DB. Right now {{ksm.db}} stores keys like following

{code}
/v1/b1
/v1/b1/k1
/v1/b1/k2
/v1/b1/k3
/v1/b2
/v1/b2/k1
/v1/b2/k2
/v1/b2/k3
/v1/b3
/v1/b4
{code}

keys are sorted in nature order so when we do list buckets under a volume e.g 
/v1, we need to seek to /v1 point and start to iterate and filter keys, this 
ends up with scanning all keys under volume /v1. The problem with this design 
is we don't have an efficient approach to locate all buckets without scanning 
the keys.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12504) Ozone: Improve SQLCLI performance

2017-09-20 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12504:
--

 Summary: Ozone: Improve SQLCLI performance
 Key: HDFS-12504
 URL: https://issues.apache.org/jira/browse/HDFS-12504
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


In my test, my {{ksm.db}} has *3017660* entries with total size of *128mb*, 
SQLCLI tool runs over *2 hours* but still not finish exporting the DB. This is 
because it iterates each entry and inserts that to another sqllite DB file, 
which is not efficient. We need to improve this to be running more efficiently 
on large DB files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12503) Ozone: some UX improvements to oz_debug

2017-09-20 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12503:
--

 Summary: Ozone: some UX improvements to oz_debug
 Key: HDFS-12503
 URL: https://issues.apache.org/jira/browse/HDFS-12503
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


I tried to use {{oz_debug}} to dump KSM DB for offline analysis, found a few 
problems need to be fixed in order to make this tool easier to use. I know this 
is a debug tool for admins, but it's still necessary to improve the UX so new 
users (like me) can figure out how to use it without reading more docs.

# Support *--help* argument. --help is the general arg for all hdfs scripts to 
print usage.
# When specify output path {{-o}}, we need to add a description to let user 
know the path needs to be a file (instead of a dir). If the path is specified 
as a dir, it will end up with a funny error {{unable to open the database file 
(out of memory)}}, which is pretty misleading. And it will be helpful to add a 
check to make sure the specified path is not an existing dir.
# SQLCLI currently swallows exception
# We should remove {{levelDB}} words from the command output as we are by 
default using rocksDB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12500) Ozone: add logger for oz shell commands and move error stack traces to DEBUG level

2017-09-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12500:
--

 Summary: Ozone: add logger for oz shell commands and move error 
stack traces to DEBUG level
 Key: HDFS-12500
 URL: https://issues.apache.org/jira/browse/HDFS-12500
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Priority: Minor


Per discussion in HDFS-12489 to reduce the verbosity of logs when exception 
happens, lets add logger to {{Shell.java}} and move error stack traces to DEBUG 
level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12492) Ozone: ListVolume output misses some attributes

2017-09-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12492:
--

 Summary: Ozone: ListVolume output misses some attributes
 Key: HDFS-12492
 URL: https://issues.apache.org/jira/browse/HDFS-12492
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


When do a listVolume call, we get output like following

{noformat}
[ {
  "owner" : {
    "name" : "wwei"
  },
  "quota" : {
    "unit" : "TB",
    "size" : 1048576
  },
  "volumeName" : "vol-0-84022",
  "createdOn" : "Mon, 18 Sep 2017 03:09:46 GMT",
  "createdBy" : null,
  "bytesUsed" : 0,
  "bucketCount" : 0
{noformat}

Values for *createdOn*, *createdBy* and *bytesUsed* and *bucketCount* are all 
missing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12489) Ozone: OzoneRestClientException swallows exceptions which makes client hard to debug failures

2017-09-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12489:
--

 Summary: Ozone: OzoneRestClientException swallows exceptions which 
makes client hard to debug failures 
 Key: HDFS-12489
 URL: https://issues.apache.org/jira/browse/HDFS-12489
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


There are multiple try-catch places swallow exceptions when transforming some 
other exception to {{OzoneRestClientException}}. As a result, when client runs 
into such code paths, they lose track of what was going on which makes the 
debug extremely difficult. See below example

{code}
bin/hdfs oz -listBucket  http://15oz1.fyre.ibm.com:9864/vol-0-84022 -user wwei
Command Failed : {"httpCode":0,"shortMessage":"Read timed 
out","resource":null,"message":"Read timed 
out","requestID":null,"hostName":null}
{code}

the returned message doesn't help much on debugging where and how it reads 
timed out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12488) Ozone: OzoneRestClient has no notion of configuration

2017-09-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12488:
--

 Summary: Ozone: OzoneRestClient has no notion of configuration
 Key: HDFS-12488
 URL: https://issues.apache.org/jira/browse/HDFS-12488
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang


When I test ozone on a 15 nodes cluster with millions of keys, responses of 
rest client becomes to be slower. Following call times out after default 5s,

{code}
bin/hdfs oz -listBucket  http://15oz1.fyre.ibm.com:9864/vol-0-84022 -user wwei
Command Failed : {"httpCode":0,"shortMessage":"Read timed 
out","resource":null,"message":"Read timed 
out","requestID":null,"hostName":null}
{code}

Then I increase the timeout by explicitly setting following property in 
{{ozone-site.xml}}

{code}
 
ozone.client.socket.timeout.ms
1
  
{code}

but this doesn't work and rest clients are still created with default *5s* 
timeout. This needs to be fixed. Just like {{DFSClient}}, we should make 
{{OzoneRestClient}} to be configuration awareness, so that clients can adjust 
client configuration on demand. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12477) Ozone: Some minor text improvement in SCM web UI

2017-09-17 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12477:
--

 Summary: Ozone: Some minor text improvement in SCM web UI
 Key: HDFS-12477
 URL: https://issues.apache.org/jira/browse/HDFS-12477
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: scm, ui
Reporter: Weiwei Yang
Priority: Trivial


While trying out SCM UI, there seems to have some small text problems, 

bq. Node Manager: Minimum chill mode nodes)

It has an extra ).

bq. $$hashKey   object:9

I am not really sure what does this mean? Would this help?

bq. Node counts

Can we place the HEALTHY ones at the top of the table?

bq. Node Manager: Chill mode status: Out of chill mode. 15 of out of total 1 
nodes have reported in.

Can we refine this text a bit?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12461) Ozone: Ozone data placement is not even

2017-09-15 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12461.

   Resolution: Not A Problem
 Assignee: Weiwei Yang
Fix Version/s: HDFS-7240

> Ozone: Ozone data placement is not even
> ---
>
> Key: HDFS-12461
> URL: https://issues.apache.org/jira/browse/HDFS-12461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Weiwei Yang
>Priority: Blocker
>  Labels: ozoneMerge
> Fix For: HDFS-7240
>
>
> On a machine with 3 data disks, Ozone keeps on picking the same disk to place 
> all containers. Looks like we have a bug in the round robin selection of 
> disks.
> Steps to Reproduce:
> 1. Install an Ozone cluster.
> 2. Make sure that datanodes have more than one disk.
> 3. Run corona few times, each run creates more containers.
> 4. Login into the data node.
> 5. Run a command like tree or ls -R /data or independently verify each 
> location.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12463) Ozone: Fix TestXceiverClientMetrics#testMetrics

2017-09-15 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12463:
--

 Summary: Ozone: Fix TestXceiverClientMetrics#testMetrics 
 Key: HDFS-12463
 URL: https://issues.apache.org/jira/browse/HDFS-12463
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Priority: Minor


{{TestXceiverClientMetrics#testMetrics}} is failing with following error in 
recent jenkins job,

{noformat}
java.util.ConcurrentModificationException: null
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at 
org.apache.hadoop.ozone.scm.TestXceiverClientMetrics.lambda$testMetrics$2(TestXceiverClientMetrics.java:134)
{noformat}

looks like a non thread safe list caused this race condition in the test case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12459) Fix revert: Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API

2017-09-14 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12459:
--

 Summary: Fix revert: Add new op GETFILEBLOCKLOCATIONS to WebHDFS 
REST API
 Key: HDFS-12459
 URL: https://issues.apache.org/jira/browse/HDFS-12459
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-11156 was reverted because the implementation was non optimal, based on 
the suggestion from [~shahrs87], we should avoid creating a dfs client to get 
block locations because that create extra RPC call. Instead we should use 
{{NamenodeProtocols#getBlockLocations}} then covert {{LocatedBlocks}} to 
{{BlockLocation[]}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API

2017-09-14 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened HDFS-11156:


> Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
> 
>
> Key: HDFS-11156
> URL: https://issues.apache.org/jira/browse/HDFS-11156
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.3
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Fix For: 3.0.0-alpha2
>
> Attachments: BlockLocationProperties_JSON_Schema.jpg, 
> BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, 
> HDFS-11156.01.patch, HDFS-11156.02.patch, HDFS-11156.03.patch, 
> HDFS-11156.04.patch, HDFS-11156.05.patch, HDFS-11156.06.patch, 
> HDFS-11156.07.patch, HDFS-11156.08.patch, HDFS-11156.09.patch, 
> HDFS-11156.10.patch, HDFS-11156.11.patch, HDFS-11156.12.patch, 
> HDFS-11156.13.patch, HDFS-11156.14.patch, HDFS-11156.15.patch, 
> HDFS-11156.16.patch, HDFS-11156-branch-2.01.patch, 
> Output_JSON_format_v10.jpg, SampleResponse_JSON.jpg
>
>
> Following webhdfs REST API
> {code}
> http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS=0=1
> {code}
> will get a response like
> {code}
> {
>   "LocatedBlocks" : {
> "fileLength" : 1073741824,
> "isLastBlockComplete" : true,
> "isUnderConstruction" : false,
> "lastLocatedBlock" : { ... },
> "locatedBlocks" : [ {...} ]
>   }
> }
> {code}
> This represents for *o.a.h.h.p.LocatedBlocks*. However according to 
> *FileSystem* API, 
> {code}
> public BlockLocation[] getFileBlockLocations(Path p, long start, long len)
> {code}
> clients would expect an array of BlockLocation. This mismatch should be 
> fixed. Marked as Incompatible change as this will change the output of the 
> GET_BLOCK_LOCATIONS API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12443) Ozone: Improve SCM block deletion throttling algorithm

2017-09-13 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12443:
--

 Summary: Ozone: Improve SCM block deletion throttling algorithm 
 Key: HDFS-12443
 URL: https://issues.apache.org/jira/browse/HDFS-12443
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Currently SCM scans delLog to send deletion transactions to datanode 
periodically, the throttling algorithm is simple, it scans at most 
{{BLOCK_DELETE_TX_PER_REQUEST_LIMIT}} (by default 50) at a time. This is 
non-optimal, worst case it might cache 50 TXs for 50 different DNs so each DN 
will only get 1 TX to proceed in an interval, this will make the deletion slow. 
An improvement to this is to make this throttling by datanode, e.g 50 TXs per 
datanode per interval.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12440) Ozone: TestAllocateContainer fails on jenkins

2017-09-13 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12440.

   Resolution: Duplicate
Fix Version/s: HDFS-7240

> Ozone: TestAllocateContainer fails on jenkins
> -
>
> Key: HDFS-12440
> URL: https://issues.apache.org/jira/browse/HDFS-12440
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: HDFS-7240
>
>
> I am seeing this failure in [this jenkins 
> report|https://builds.apache.org/job/PreCommit-HDFS-Build/21089/testReport/org.apache.hadoop.ozone.scm/TestAllocateContainer/testAllocate/],
>  with following error
> {noformat}
> Stacktrace
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.ozone.scm.node.SCMNodeManager.getNodeStat(SCMNodeManager.java:828)
>  at 
> org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.hasEnoughSpace(SCMCommonPolicy.java:147)
>  at 
> org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.lambda$chooseDatanodes$0(SCMCommonPolicy.java:125)
>  at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12440) Ozone: TestAllocateContainer fails on jenkins

2017-09-13 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12440:
--

 Summary: Ozone: TestAllocateContainer fails on jenkins
 Key: HDFS-12440
 URL: https://issues.apache.org/jira/browse/HDFS-12440
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


I am seeing this failure in [this jenkins 
report|https://builds.apache.org/job/PreCommit-HDFS-Build/21067/testReport/org.apache.hadoop.ozone.scm/TestAllocateContainer/org_apache_hadoop_ozone_scm_TestAllocateContainer/],
 with following error

{noformat}
Stacktrace

java.lang.NullPointerException: null
at 
org.apache.hadoop.hdfs.MiniDFSCluster.setDataNodeStorageCapacities(MiniDFSCluster.java:1715)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.setDataNodeStorageCapacities(MiniDFSCluster.java:1694)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1674)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:882)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:494)
at 
org.apache.hadoop.ozone.MiniOzoneCluster.(MiniOzoneCluster.java:98)
at 
org.apache.hadoop.ozone.MiniOzoneCluster.(MiniOzoneCluster.java:77)
at 
org.apache.hadoop.ozone.MiniOzoneCluster$Builder.build(MiniOzoneCluster.java:441)
at 
org.apache.hadoop.ozone.scm.TestAllocateContainer.init(TestAllocateContainer.java:56)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12415) Ozone: TestXceiverClientManager occasionally fails

2017-09-11 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12415:
--

 Summary: Ozone: TestXceiverClientManager occasionally fails
 Key: HDFS-12415
 URL: https://issues.apache.org/jira/browse/HDFS-12415
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


TestXceiverClientManager seems to be occasionally failing in some jenkins jobs,

{noformat}
java.lang.NullPointerException
 at 
org.apache.hadoop.ozone.scm.node.SCMNodeManager.getNodeStat(SCMNodeManager.java:828)
 at 
org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.hasEnoughSpace(SCMCommonPolicy.java:147)
 at 
org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.lambda$chooseDatanodes$0(SCMCommonPolicy.java:125)
{noformat}

see more from [this 
report|https://builds.apache.org/job/PreCommit-HDFS-Build/21065/testReport/]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12391) Ozone: TestKSMSQLCli is not working as expected

2017-09-04 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12391:
--

 Summary: Ozone: TestKSMSQLCli is not working as expected
 Key: HDFS-12391
 URL: https://issues.apache.org/jira/browse/HDFS-12391
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, test
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


I found this issue while investigating the {{TestKSMSQLCli}} failure in [this 
jenkins 
report|https://builds.apache.org/job/PreCommit-HDFS-Build/20984/testReport/], 
the test is supposed to use parameterized class to test both {{LevelDB}} and 
{{RocksDB}} implementation of metadata stores, however it only tests default 
{{RocksDB}} case twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12367) Ozone: Too many open files error while running corona

2017-09-04 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12367.

Resolution: Duplicate

I think this issue no longer happens to me, closing it as a dup to HDFS-12382 
as this looks like to be fixed there, thanks [~nandakumar131]. [~msingh] feel 
free to create another lower severity JIRA to track resource leaks you found in 
code level. I will close this one as it is no longer a blocker for tests.

> Ozone: Too many open files error while running corona
> -
>
> Key: HDFS-12367
> URL: https://issues.apache.org/jira/browse/HDFS-12367
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>Reporter: Weiwei Yang
>Assignee: Mukul Kumar Singh
>
> Too many open files error keeps happening to me while using corona, I have 
> simply setup a single node cluster and run corona to generate 1000 keys, but 
> I keep getting following error
> {noformat}
> ./bin/hdfs corona -numOfThreads 1 -numOfVolumes 1 -numOfBuckets 1 -numOfKeys 
> 1000
> 17/08/28 00:47:42 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 17/08/28 00:47:42 INFO tools.Corona: Number of Threads: 1
> 17/08/28 00:47:42 INFO tools.Corona: Mode: offline
> 17/08/28 00:47:42 INFO tools.Corona: Number of Volumes: 1.
> 17/08/28 00:47:42 INFO tools.Corona: Number of Buckets per Volume: 1.
> 17/08/28 00:47:42 INFO tools.Corona: Number of Keys per Bucket: 1000.
> 17/08/28 00:47:42 INFO rpc.OzoneRpcClient: Creating Volume: vol-0-05000, with 
> wwei as owner and quota set to 1152921504606846976 bytes.
> 17/08/28 00:47:42 INFO tools.Corona: Starting progress bar Thread.
> ...
> ERROR tools.Corona: Exception while adding key: key-251-19293 in bucket: 
> bucket-0-34960 of volume: vol-0-05000.
> java.io.IOException: Exception getting XceiverClient.
>   at 
> org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:156)
>   at 
> org.apache.hadoop.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:122)
>   at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.getFromKsmKeyInfo(ChunkGroupOutputStream.java:289)
>   at 
> org.apache.hadoop.ozone.client.rpc.OzoneRpcClient.createKey(OzoneRpcClient.java:487)
>   at 
> org.apache.hadoop.ozone.tools.Corona$OfflineProcessor.run(Corona.java:352)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: com.google.common.util.concurrent.UncheckedExecutionException: 
> java.lang.IllegalStateException: failed to create a child event loop
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>   at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>   at 
> org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:144)
>   ... 9 more
> Caused by: java.lang.IllegalStateException: failed to create a child event 
> loop
>   at 
> io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:68)
>   at 
> io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49)
>   at 
> io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:61)
>   at 
> io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:52)
>   at 
> io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:44)
>   at 
> io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:36)
>   at org.apache.hadoop.scm.XceiverClient.connect(XceiverClient.java:76)
>   at 
> org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:151)
>   at 
> org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:145)
>   at 
> com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>   ... 12 more
> Caused by: io.netty.channel.ChannelException: failed to open a new selector
>   at 

[jira] [Created] (HDFS-12389) Ozone: oz commandline list calls should return valid JSON format output

2017-09-03 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12389:
--

 Summary: Ozone: oz commandline list calls should return valid JSON 
format output
 Key: HDFS-12389
 URL: https://issues.apache.org/jira/browse/HDFS-12389
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


At present the outputs of {{listVolume}}, {{listBucket}} and {{listKey}} are 
hard to parse, for example following call

{code}
./bin/hdfs oz -listVolume http://localhost:9864 -user wwei
{code}

lists all volumes in my cluster and it returns

{noformat}
{
"version" : 0,
"md5hash" : null,
"createdOn" : "Mon, 04 Sep 2017 03:25:22 GMT",
"modifiedOn" : "Mon, 04 Sep 2017 03:25:22 GMT",
"size" : 10240,
"keyName" : "key-0-22381",
"dataFileName" : null
  }
 {  
"version" : 0,
"md5hash" : null,
"createdOn" : "Mon, 04 Sep 2017 03:25:22 GMT",
"modifiedOn" : "Mon, 04 Sep 2017 03:25:22 GMT",
"size" : 10240,
"keyName" : "key-0-22381",
"dataFileName" : null
  }
  ...
{noformat}

this is not a valid JSON format output hence it is hard to parse in clients' 
script for further interactions. Propose to reformat them to valid JSON data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12367) Ozone: Too many open files error while running corona

2017-08-28 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12367:
--

 Summary: Ozone: Too many open files error while running corona
 Key: HDFS-12367
 URL: https://issues.apache.org/jira/browse/HDFS-12367
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, tools
Reporter: Weiwei Yang


Too many open files error keeps happening to me while using corona, I have 
simply setup a single node cluster and run corona to generate 1000 keys, but I 
keep getting following error

{noformat}
./bin/hdfs corona -numOfThreads 1 -numOfVolumes 1 -numOfBuckets 1 -numOfKeys 
1000
17/08/28 00:47:42 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
17/08/28 00:47:42 INFO tools.Corona: Number of Threads: 1
17/08/28 00:47:42 INFO tools.Corona: Mode: offline
17/08/28 00:47:42 INFO tools.Corona: Number of Volumes: 1.
17/08/28 00:47:42 INFO tools.Corona: Number of Buckets per Volume: 1.
17/08/28 00:47:42 INFO tools.Corona: Number of Keys per Bucket: 1000.
17/08/28 00:47:42 INFO rpc.OzoneRpcClient: Creating Volume: vol-0-05000, with 
wwei as owner and quota set to 1152921504606846976 bytes.
17/08/28 00:47:42 INFO tools.Corona: Starting progress bar Thread.
...
ERROR tools.Corona: Exception while adding key: key-251-19293 in bucket: 
bucket-0-34960 of volume: vol-0-05000.
java.io.IOException: Exception getting XceiverClient.
at 
org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:156)
at 
org.apache.hadoop.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:122)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.getFromKsmKeyInfo(ChunkGroupOutputStream.java:289)
at 
org.apache.hadoop.ozone.client.rpc.OzoneRpcClient.createKey(OzoneRpcClient.java:487)
at 
org.apache.hadoop.ozone.tools.Corona$OfflineProcessor.run(Corona.java:352)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.IllegalStateException: failed to create a child event loop
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at 
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
at 
org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:144)
... 9 more
Caused by: java.lang.IllegalStateException: failed to create a child event loop
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:68)
at 
io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:61)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:52)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:44)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:36)
at org.apache.hadoop.scm.XceiverClient.connect(XceiverClient.java:76)
at 
org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:151)
at 
org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:145)
at 
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
... 12 more
Caused by: io.netty.channel.ChannelException: failed to open a new selector
at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:128)
at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:120)
at 
io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
... 25 more
Caused by: java.io.IOException: Too many open files
at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:130)
at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:69)
at 
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)

[jira] [Created] (HDFS-12366) Ozone: Refactor KSM metadata class names to avoid confusion

2017-08-28 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12366:
--

 Summary: Ozone: Refactor KSM metadata class names to avoid 
confusion
 Key: HDFS-12366
 URL: https://issues.apache.org/jira/browse/HDFS-12366
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Trivial


Propose to rename 2 classes in package {{org.apache.hadoop.ozone.ksm}}

* MetadataManager -> KsmMetadataManager
* MetadataManagerImpl -> KsmMetadataManagerImpl

this is to avoid confusions with ozone metadata store classes, such as 
{{MetadataKeyFilters}}, {{MetadataStore}} and {{MetadataStoreBuilder}} etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12365) Ozone: ListVolume displays incorrect createdOn time when the volume was created by OzoneRpcClient

2017-08-28 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12365:
--

 Summary: Ozone: ListVolume displays incorrect createdOn time when 
the volume was created by OzoneRpcClient
 Key: HDFS-12365
 URL: https://issues.apache.org/jira/browse/HDFS-12365
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Reproduce steps

1. Create a key in ozone with corona (this delegates the call to 
OzoneRpcClient), e.g

{code}
[wwei@ozone1 hadoop-3.0.0-beta1-SNAPSHOT]$ ./bin/hdfs corona -numOfThreads 1 
-numOfVolumes 1 -numOfBuckets 1 -numOfKeys 1
{code}

2. Run listVolume

{code}
[wwei@ozone1 hadoop-3.0.0-beta1-SNAPSHOT]$ ./bin/hdfs oz -listVolume 
http://localhost:9864 -user wwei
{
  "owner" : {
"name" : "wwei"
  },
  "quota" : {
"unit" : "TB",
"size" : 1048576
  },
  "volumeName" : "vol-0-31437",
  "createdOn" : "Thu, 01 Jan 1970 00:00:00 GMT",
  "createdBy" : null
}
{
  "owner" : {
"name" : "wwei"
  },
  "quota" : {
"unit" : "TB",
"size" : 1048576
  },
  "volumeName" : "vol-0-38900",
  "createdOn" : "Thu, 01 Jan 1970 00:00:00 GMT",
  "createdBy" : null
}
{code}

Note, the time displayed in {{createdOn}} are both incorrect {{Thu, 01 Jan 1970 
00:00:00 GMT}}.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12362) Ozone: write deleted block to RAFT log for consensus on datanodes

2017-08-27 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12362:
--

 Summary: Ozone: write deleted block to RAFT log for consensus on 
datanodes
 Key: HDFS-12362
 URL: https://issues.apache.org/jira/browse/HDFS-12362
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang


Per discussion in HDFS-12282, we need to write deleted blocks info to RAFT log 
when that is ready, see more from [comment from Anu | 
https://issues.apache.org/jira/browse/HDFS-12282?focusedCommentId=16136022=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16136022].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12361) Ozone: SCM failed to start when a container metadata is empty

2017-08-27 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12361:
--

 Summary: Ozone: SCM failed to start when a container metadata is 
empty
 Key: HDFS-12361
 URL: https://issues.apache.org/jira/browse/HDFS-12361
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


When I run tests to create keys via corona, sometimes it left some containers 
with empty metadata. This might also happen when SCM stopped at some point that 
metadata was not yet written. When this happens, we got following error and SCM 
could not be started

{noformat}
17/08/27 20:10:57 WARN datanode.DataNode: Unexpected exception in block pool 
Block pool BP-821804790-172.16.165.133-1503887277256 (Datanode Uuid 
7ee16a59-9604-406e-a0f8-6f44650a725b) service to 
ozone1.fyre.ibm.com/172.16.165.133:8111
java.lang.NullPointerException
at 
org.apache.hadoop.ozone.container.common.helpers.ContainerData.getFromProtBuf(ContainerData.java:66)
at 
org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainerInfo(ContainerManagerImpl.java:210)
at 
org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:158)
at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:99)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:77)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1592)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:409)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:783)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:286)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
at java.lang.Thread.run(Thread.java:745)
{noformat}

We should add a NPE check and mark such containers as inactive without failing 
the SCM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12354) Improve the throttle algorithm in Datanode BlockDeletingService

2017-08-25 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12354:
--

 Summary: Improve the throttle algorithm in Datanode 
BlockDeletingService 
 Key: HDFS-12354
 URL: https://issues.apache.org/jira/browse/HDFS-12354
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


{{BlockDeletingService}} is a per-datanode container block deleting service 
takes in charge of the "real" deletion of ozone blocks. It spawns a worker 
thread per container and delete blocks/chunks from disk as background threads. 
The number of threads currently is throttled by 
{{ozone.block.deleting.container.limit.per.interval}}, but there is a potential 
problem. Containers are sorted so it always fetch same of containers, we need 
to fix this by creating an API in {{ContainerManagerImpl}} to get a shuffled 
list of containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12039) Ozone: Implement update volume owner in ozone shell

2017-08-17 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12039.

   Resolution: Fixed
Fix Version/s: HDFS-7240

> Ozone: Implement update volume owner in ozone shell
> ---
>
> Key: HDFS-12039
> URL: https://issues.apache.org/jira/browse/HDFS-12039
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Lokesh Jain
> Fix For: HDFS-7240
>
>
> Ozone shell command {{updateVolume}} should support to update the owner of a 
> volume, using following syntax
> {code}
> hdfs oz -updateVolume http://ozone1.fyre.ibm.com:9864/volume-wwei-0 -owner 
> xyz -root
> {code}
> this could work from rest api, following command could change the volume 
> owner to {{www}}
> {code}
> curl -X PUT -H "Date: Mon, 26 Jun 2017 04:23:30 GMT" -H "x-ozone-version: v1" 
> -H "x-ozone-user:www" -H "Authorization:OZONE root" 
> http://ozone1.fyre.ibm.com:9864/volume-wwei-0
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12307) Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails

2017-08-16 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12307.

Resolution: Duplicate
  Assignee: Weiwei Yang

> Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails
> ---
>
> Key: HDFS-12307
> URL: https://issues.apache.org/jira/browse/HDFS-12307
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>
> It seems this UT constantly fails with following error
> {noformat}
> org.apache.hadoop.ozone.web.exceptions.OzoneException: Exception getting 
> XceiverClient.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:119)
>   at 
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createUsingDefault(StdValueInstantiator.java:243)
>   at 
> com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:146)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:133)
>   at 
> com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1579)
>   at 
> com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1200)
>   at 
> org.apache.hadoop.ozone.web.exceptions.OzoneException.parse(OzoneException.java:248)
>   at 
> org.apache.hadoop.ozone.web.client.OzoneBucket.executeGetKey(OzoneBucket.java:395)
>   at 
> org.apache.hadoop.ozone.web.client.OzoneBucket.getKey(OzoneBucket.java:321)
>   at 
> org.apache.hadoop.ozone.web.client.TestKeys.runTestPutAndGetKeyWithDnRestart(TestKeys.java:288)
>   at 
> org.apache.hadoop.ozone.web.client.TestKeys.testPutAndGetKeyWithDnRestart(TestKeys.java:265)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12307) Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails

2017-08-15 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12307:
--

 Summary: Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails
 Key: HDFS-12307
 URL: https://issues.apache.org/jira/browse/HDFS-12307
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


It seems this UT constantly fails with following error

{noformat}
org.apache.hadoop.ozone.web.exceptions.OzoneException: Exception getting 
XceiverClient.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:119)
at 
com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createUsingDefault(StdValueInstantiator.java:243)
at 
com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:146)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:133)
at 
com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1579)
at 
com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1200)
at 
org.apache.hadoop.ozone.web.exceptions.OzoneException.parse(OzoneException.java:248)
at 
org.apache.hadoop.ozone.web.client.OzoneBucket.executeGetKey(OzoneBucket.java:395)
at 
org.apache.hadoop.ozone.web.client.OzoneBucket.getKey(OzoneBucket.java:321)
at 
org.apache.hadoop.ozone.web.client.TestKeys.runTestPutAndGetKeyWithDnRestart(TestKeys.java:288)
at 
org.apache.hadoop.ozone.web.client.TestKeys.testPutAndGetKeyWithDnRestart(TestKeys.java:265)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12283) Ozone: DeleteKey-5: Implement SCM DeletedBlockLog

2017-08-09 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12283:
--

 Summary: Ozone: DeleteKey-5: Implement SCM DeletedBlockLog
 Key: HDFS-12283
 URL: https://issues.apache.org/jira/browse/HDFS-12283
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Reporter: Weiwei Yang
Assignee: Weiwei Yang


The DeletedBlockLog is a persisted log in SCM to keep tracking container blocks 
which are under deletion. It maintains info about under-deletion container 
blocks that notified by KSM, and the state how it is processed. We can use 
RocksDB to implement the 1st version of the log, the schema looks like

||TxID||ContainerName||Block List||ProcessedCount||
|0|c1|b1,b2,b3|0|
|1|c2|b1|3|
|2|c2|b2, b3|-1|

Some explanations

# TxID is an incremental long value transaction ID for ONE container and 
multiple blocks
# Container name is the name of the container
# Block list is a list of block IDs
# ProcessedCount is the number of times SCM has sent this record to datanode, 
it represents the "state" of the transaction, it is in range of \[-1, 5\], -1 
means the transaction eventually failed after some retries, 5 is the max number 
times of retries.

We need to define {{DeletedBlockLog}} as an interface and implement this with 
RocksDB {{MetadataStore}} as the first version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12282) Ozone: DeleteKey-4: SCM periodically sends block deletion message to datanode via HB and handles response

2017-08-09 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12282:
--

 Summary: Ozone: DeleteKey-4: SCM periodically sends block deletion 
message to datanode via HB and handles response
 Key: HDFS-12282
 URL: https://issues.apache.org/jira/browse/HDFS-12282
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ozone, scm
Reporter: Weiwei Yang
Assignee: Weiwei Yang


This is the task 3 in the design doc, implements the SCM to datanode 
interactions. Including

# SCM sends block deletion message via HB to datanode
# datanode changes block state to deleting when processes the HB response
# datanode sends deletion ACKs back to SCM
# SCM handles ACKs and removes blocks in DB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12246) Ozone: potential thread leaks

2017-08-02 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12246:
--

 Summary: Ozone: potential thread leaks
 Key: HDFS-12246
 URL: https://issues.apache.org/jira/browse/HDFS-12246
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


Per discussion in HDFS-12163, there might be some places potentially leaks 
threads, we will use this jira to track the work to fix those leaks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12235) Ozone: DeleteKey-3: KSM SCM block deletion message and ACK interactions

2017-08-01 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12235:
--

 Summary: Ozone: DeleteKey-3: KSM SCM block deletion message and 
ACK interactions
 Key: HDFS-12235
 URL: https://issues.apache.org/jira/browse/HDFS-12235
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


KSM and SCM interaction for delete key operation, both KSM and SCM stores key 
state info in a backlog, KSM needs to scan this log and send block-deletion 
command to SCM, once SCM is fully aware of the message, KSM removes the key 
completely from namespace.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12196) Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background

2017-07-25 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12196:
--

 Summary: Ozone: DeleteKey-2: Implement container recycling service 
to delete stale blocks at background
 Key: HDFS-12196
 URL: https://issues.apache.org/jira/browse/HDFS-12196
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Implement a recycling service running on datanode to delete stale blocks 
periodically. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12195) Ozone: DeleteKey-1: KSM replies delete key request asynchronously

2017-07-25 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12195:
--

 Summary: Ozone: DeleteKey-1: KSM replies delete key request 
asynchronously
 Key: HDFS-12195
 URL: https://issues.apache.org/jira/browse/HDFS-12195
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Yuanbo Liu


We will implement delete key in ozone in multiple child tasks, this is 1 of the 
child task to implement client to scm communication. We need to do it in async 
manner, once key state is changed in ksm metadata, ksm is ready to reply client 
with a successful message. Actual deletes on other layers will happen some time 
later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey

2017-07-20 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12167:
--

 Summary: Ozone: Intermittent failure 
TestContainerPersistence#testListKey
 Key: HDFS-12167
 URL: https://issues.apache.org/jira/browse/HDFS-12167
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, test
Reporter: Weiwei Yang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-15 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12149:
--

 Summary: Ozone: RocksDB implementation of ozone metadata store
 Key: HDFS-12149
 URL: https://issues.apache.org/jira/browse/HDFS-12149
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-12069 added a general interface for ozone metadata store, we already have 
a leveldb implementation, this JIRA is to track the work of rocksdb 
implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties

2017-07-15 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12148:
--

 Summary: Ozone: TestOzoneConfigurationFields is failing because 
ozone-default.xml has some missing properties
 Key: HDFS-12148
 URL: https://issues.apache.org/jira/browse/HDFS-12148
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


Following properties added by HDFS-11493 is missing in ozone-default.xml

{noformat}
ozone.scm.max.container.report.threads
ozone.scm.container.report.processing.interval.seconds
ozone.scm.container.reports.wait.timeout.seconds
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12129) Ozone

2017-07-12 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12129:
--

 Summary: Ozone
 Key: HDFS-12129
 URL: https://issues.apache.org/jira/browse/HDFS-12129
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Weiwei Yang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-07 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12098:
--

 Summary: Ozone: Datanode is unable to register with scm if scm 
starts later
 Key: HDFS-12098
 URL: https://issues.apache.org/jira/browse/HDFS-12098
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ozone, scm
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Critical


Reproducing steps
# Start datanode
# Wait and see datanode state, it has connection issues, this is expected
# Start SCM, expecting datanode could connect to the scm and the state machine 
could transit to RUNNING. However in actual, its state transits to SHUTDOWN, 
datanode enters chill mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12096) Ozone: Bucket versioning design document

2017-07-06 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-12096.

Resolution: Duplicate

> Ozone: Bucket versioning design document
> 
>
> Key: HDFS-12096
> URL: https://issues.apache.org/jira/browse/HDFS-12096
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: Ozone Bucket Versioning v1.pdf
>
>
> This JIRA is opened for the discussion of the bucket versioning design. 
> The bucket versioning is the ability to hold multiple versions objects of a 
> key in a bucket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12096) Ozone: Bucket versioning design document

2017-07-06 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12096:
--

 Summary: Ozone: Bucket versioning design document
 Key: HDFS-12096
 URL: https://issues.apache.org/jira/browse/HDFS-12096
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


This JIRA is opened for the discussion of the bucket versioning design. 
The bucket versioning is the ability to hold multiple versions objects of a key 
in a bucket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12085) Reconfigure namenode interval fails if the interval was set with time unit

2017-07-04 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12085:
--

 Summary: Reconfigure namenode interval fails if the interval was 
set with time unit
 Key: HDFS-12085
 URL: https://issues.apache.org/jira/browse/HDFS-12085
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, tools
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Critical


It fails when I set duration with time unit, e.g 5s, error

{noformat}
Reconfiguring status for node [localhost:8111]: started at Tue Jul 04 08:14:18 
PDT 2017 and finished at Tue Jul 04 08:14:18 PDT 2017.
FAILED: Change property dfs.heartbeat.interval
From: "3s"
To: "5s"
Error: For input string: "5s".
{noformat}

time unit support was added via HDFS-9847.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12082) BlockInvalidateLimit value is incorrectly set after namenode heartbeat interval reconfigured

2017-07-04 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12082:
--

 Summary: BlockInvalidateLimit value is incorrectly set after 
namenode heartbeat interval reconfigured 
 Key: HDFS-12082
 URL: https://issues.apache.org/jira/browse/HDFS-12082
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, namenode
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-1477 provides an option to reconfigured namenode heartbeat interval 
without restarting the namenode. When the heartbeat interval is reconfigured, 
{{blockInvalidateLimit}} gets recounted

{code}
 this.blockInvalidateLimit = Math.max(20 * (int) (intervalSeconds),
DFSConfigKeys.DFS_BLOCK_INVALIDATE_LIMIT_DEFAULT);
{code}

this doesn't honor the existing value set by {{dfs.block.invalidate.limit}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12081) Ozone: Add infoKey REST API document

2017-07-03 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12081:
--

 Summary: Ozone: Add infoKey REST API document
 Key: HDFS-12081
 URL: https://issues.apache.org/jira/browse/HDFS-12081
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-12030 has implemented {{infoKey}}, need to add appropriate document to 
{{OzoneRest.md}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12080) Ozone: Fix UT failure in TestOzoneConfigurationFields

2017-07-03 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12080:
--

 Summary: Ozone: Fix UT failure in TestOzoneConfigurationFields
 Key: HDFS-12080
 URL: https://issues.apache.org/jira/browse/HDFS-12080
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Priority: Minor


HDFS-12023 added a test case {{TestOzoneConfigurationFields}} to make sure 
ozone configuration properties is fully documented in ozone-default.xml. This 
is currently failing because

1. ozone-default.xml has 1 property not used anywhere

{code}
ozone.scm.internal.bind.host
{code}

2. Some cblock properties are missing in ozone-default.xml

{code}
  dfs.cblock.scm.ipaddress
  dfs.cblock.scm.port
  dfs.cblock.jscsi-address
  dfs.cblock.service.rpc-bind-host
  dfs.cblock.jscsi.rpc-bind-host
{code}

this needs to be fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12079) Description of dfs.block.invalidate.limit is incorrect in hdfs-default.xml

2017-07-03 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12079:
--

 Summary: Description of dfs.block.invalidate.limit is incorrect in 
hdfs-default.xml
 Key: HDFS-12079
 URL: https://issues.apache.org/jira/browse/HDFS-12079
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Weiwei Yang
Assignee: Weiwei Yang


The description of property {{dfs.block.invalidate.limit}} in hdfs-default.xml 
is

{{noformat}}
Limit on the list of invalidated block list kept by the Namenode.
{{noformat}}

this seems not correct that would confuse user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12069) Ozone: Create a general abstraction for metadata store

2017-06-29 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12069:
--

 Summary: Ozone: Create a general abstraction for metadata store
 Key: HDFS-12069
 URL: https://issues.apache.org/jira/browse/HDFS-12069
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Create a general abstraction for metadata store so that we can plug other key 
value store to host ozone metadata. Currently only levelDB is implemented, we 
want to support RocksDB as it provides more production ready features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12053) Ozone: ozone server should create missing metadata directory if it has permission to

2017-06-28 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12053:
--

 Summary: Ozone: ozone server should create missing metadata 
directory if it has permission to
 Key: HDFS-12053
 URL: https://issues.apache.org/jira/browse/HDFS-12053
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


Datanode state machine right now simple fails if container metadata directory 
is missing, it is better to create the directory if it has permission to. This 
is extremely useful at a fresh setup, usually we set 
{{ozone.container.metadata.dirs}} to be under same parent of 
{{dfs.datanode.data.dir}}. E.g

* /hadoop/hdfs/data
* /hadoop/hdfs/scm

if I don't pre-setup /hadoop/hdfs/scm/repository, ozone could not be started.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12047) Ozone: Add REST API documentation

2017-06-27 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12047:
--

 Summary: Ozone: Add REST API documentation
 Key: HDFS-12047
 URL: https://issues.apache.org/jira/browse/HDFS-12047
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Add ozone rest api documentation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12039) Ozone: Implement update volume owner in ozone shell

2017-06-26 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12039:
--

 Summary: Ozone: Implement update volume owner in ozone shell
 Key: HDFS-12039
 URL: https://issues.apache.org/jira/browse/HDFS-12039
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


Ozone shell command {{updateVolume}} should support to update the owner of a 
volume, using following syntax

{code}
hdfs oz -updateVolume http://ozone1.fyre.ibm.com:9864/volume-wwei-0 -owner xyz 
-root
{code}

this could work from rest api, following command could change the volume owner 
to {{www}}

{code}
curl -X PUT -H "Date: Mon, 26 Jun 2017 04:23:30 GMT" -H "x-ozone-version: v1" 
-H "x-ozone-user:www" -H "Authorization:OZONE root" 
http://ozone1.fyre.ibm.com:9864/volume-wwei-0
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12037) Ozone: Improvement rest API output format for better looking

2017-06-26 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12037:
--

 Summary: Ozone: Improvement rest API output format for better 
looking
 Key: HDFS-12037
 URL: https://issues.apache.org/jira/browse/HDFS-12037
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Right now ozone rest api output is displayed as a raw json string in single 
line, not quite human readable,

{noformat}
{"volumes":[{"owner":{"name":"wwei"},"quota":{"unit":"GB","size":200},"volumeName":"volume-aug-1","createdOn":null,"createdBy":null}]}

{noformat}

propose to improve the output format by pretty printer

{noformat}
{
  "volumes" : [ {
"owner" : {
  "name" : "wwei"
},
"quota" : {
  "unit" : "GB",
  "size" : 200
},
"volumeName" : "volume-aug-1",
"createdOn" : null,
"createdBy" : null
  } ]
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12035) Ozone: listKey doesn't work from ozone commandline

2017-06-24 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12035:
--

 Summary: Ozone: listKey doesn't work from ozone commandline
 Key: HDFS-12035
 URL: https://issues.apache.org/jira/browse/HDFS-12035
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


HDFS-11782 implements listKey operation in KSM server side, but the commandline 
doesn't work right now, 

{code}
./bin/hdfs oz -listKey http://ozone1.fyre.ibm.com:9864/volume-wwei-0/bucket1/
{code}

gives me following output

{noformat}
Command Failed : 
{"httpCode":400,"shortMessage":"invalidBucketName","resource":"wwei","message":"Illegal
 max number of keys specified, the value must be in range (0, 1024], actual : 
0.","requestID":"d1a33851-6bfa-48d2-9afc-9dd7b06dfb0e","hostName":"ozone1.fyre.ibm.com"}
{noformat}

I think we have following things missing

# ListKeyHandler doesn't support common listing arguments, start, length and 
prefix.
# Http request to {{Bucket#listBucket}} uses 0 as the default value, I think 
that's why we got "Illegal max number of keys specified" error from command 
line.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11918) Ozone: Encapsulate KSM metadata key for better (de)serialization

2017-06-23 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-11918.

Resolution: Later

> Ozone: Encapsulate KSM metadata key for better (de)serialization
> 
>
> Key: HDFS-11918
> URL: https://issues.apache.org/jira/browse/HDFS-11918
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: HDFS-11918-HDFS-7240.001.patch
>
>
> There are multiple type of keys stored in KSM database
> # Volume Key
> # Bucket Key
> # Object Key
> # User Key
> Currently they are represented as plain string with some conventions, such as
> # /volume
> # /volume/bucket
> # /volume/bucket/key
> # $user
> this approach makes it so difficult to parse volume/bucket/keys from KSM 
> database. Propose to encapsulate these types of keys into protobuf messages, 
> and take advantage of protobuf to serialize(deserialize) classes to byte 
> arrays (and vice versa).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11984) Ozone: Ensures listKey lists all required key fields

2017-06-15 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11984:
--

 Summary: Ozone: Ensures listKey lists all required key fields
 Key: HDFS-11984
 URL: https://issues.apache.org/jira/browse/HDFS-11984
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


HDFS-11782 implements the listKey operation which only lists the basic key 
fields, we need to make sure it return all required fields

# version
# md5hash
# createdOn
# size
# keyName
# dataFileName

this task is depending on the work of HDFS-11886. See more discussion [here | 
https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11959) Ozone: Audit Logs

2017-06-09 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11959:
--

 Summary: Ozone: Audit Logs
 Key: HDFS-11959
 URL: https://issues.apache.org/jira/browse/HDFS-11959
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Add audit logs for ozone components.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11958) Ozone: Ensure KSM is initiated using ProtobufRpcEngine

2017-06-09 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11958:
--

 Summary: Ozone: Ensure KSM is initiated using ProtobufRpcEngine
 Key: HDFS-11958
 URL: https://issues.apache.org/jira/browse/HDFS-11958
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Critical


Reproduce Steps

# Launch an ozone cluster
# Create a volume via commandline
{code}
hdfs oz -createVolume http://ozone1.fyre.ibm.com:9864/volume-wwei-0 -user root
{code}

it failed with following error

{noformat}
SEVERE: The RuntimeException could not be mapped to a response, re-throwing to 
the HTTP container
java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID
 at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:182)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:114)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:247)
at com.sun.proxy.$Proxy18.createVolume(Unknown Source)
...
Caused by: java.lang.NoSuchFieldException: versionID
at java.lang.Class.getField(Class.java:1703)
at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:178)
... 25 more
{noformat}

This was because {{keySpaceManagerClient}} in {{ObjectStoreHandler}} currently 
is not properly initiated, it should be using {{ProtobufRpcEngine}} instead of 
{{WritableRpcEngine}} which is deprecated.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11955) Ozone: Set proper parameter default values for listBuckets http request

2017-06-08 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11955:
--

 Summary: Ozone: Set proper parameter default values for 
listBuckets http request
 Key: HDFS-11955
 URL: https://issues.apache.org/jira/browse/HDFS-11955
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-11779 implements the listBuckets function in ozone server side, the API 
supports several parameters, startKey, count and prefix. But both of them are 
optional for the client side rest API. This jira is to make sure we set proper 
default values in the http request if they are not explicitly set by users.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11952) Ozone: Fix regression TestContainerSQLCli#testConvertContainerDB

2017-06-08 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11952:
--

 Summary: Ozone: Fix regression 
TestContainerSQLCli#testConvertContainerDB
 Key: HDFS-11952
 URL: https://issues.apache.org/jira/browse/HDFS-11952
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


TestContainerSQLCli#testConvertContainerDB is failing since HDFS-11568. Error 
message:
{noformat}
2017-06-08 08:21:47,653 [main] ERROR  - DB path not 
exist:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/MiniOzoneCluster1113d40f-586f-4914-9ac4-a37c1a3a561d/05bdadbc-1e60-46e0-bf57-efc4f21f2e7e/scm/container.db
...
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.ozone.scm.TestContainerSQLCli.testConvertContainerDB(TestContainerSQLCli.java:255)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11951) Ozone

2017-06-08 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11951:
--

 Summary: Ozone
 Key: HDFS-11951
 URL: https://issues.apache.org/jira/browse/HDFS-11951
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Weiwei Yang






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11926) Ozone: Implement a common helper to return a range of KVs in levelDB

2017-06-05 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11926:
--

 Summary: Ozone: Implement a common helper to return a range of KVs 
in levelDB
 Key: HDFS-11926
 URL: https://issues.apache.org/jira/browse/HDFS-11926
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


There are quite some *LIST* operations need to get a range of keys or values 
from levelDB, and filter entries with key prefix. 

# HDFS-11782 listKeys
# HDFS-11779 listBuckets
# HDFS-11773 listVolumes
# HDFS-11679 listContainers

we need to implement a common utility for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11917) Why when using the hdfs nfs gateway, a file which is smaller than one block size required a block

2017-06-02 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-11917.

Resolution: Not A Problem
  Assignee: Weiwei Yang

> Why when using the hdfs nfs gateway, a file which is smaller than one block 
> size required a block
> -
>
> Key: HDFS-11917
> URL: https://issues.apache.org/jira/browse/HDFS-11917
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.8.0
>Reporter: BINGHUI WANG
>Assignee: Weiwei Yang
>
> I use the linux shell to put the file into the hdfs throuth the hdfs nfs 
> gateway. I found that if the file which size is smaller than one block(128M), 
> it will still takes one block(128M) of hdfs storage by this way. But after a 
> few minitues the excess storage will be released.
> e.g:If I put the file(60M) into the hdfs throuth the hdfs nfs gateway, it 
> will takes one block(128M) at first. After a few minitues the excess 
> storage(68M) will
> be released. The file only use 60M hdfs storage at last.
> Why is will be this?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11918) Ozone: Encapsulate KSM metadata key into protobuf messages for better (de)serialization

2017-06-02 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11918:
--

 Summary: Ozone: Encapsulate KSM metadata key into protobuf 
messages for better (de)serialization
 Key: HDFS-11918
 URL: https://issues.apache.org/jira/browse/HDFS-11918
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Critical


There are multiple type of keys stored in KSM database
# Volume Key
# Bucket Key
# Object Key
# User Key

Currently they are represented as plain string with different convention, such 
as
# /volume
# /volume/bucket
# /volume/bucket/key
# $user

this approach makes it so difficult to parse volume/bucket/keys from KSM 
database. Propose to encapsulate these types of keys into protobuf messages, 
and take advantage of protobuf to serialize(deserialize) classes to byte arrays 
(and vice versa).





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11913) Ozone: TestKeySpaceManager#testDeleteVolume fails

2017-06-01 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11913:
--

 Summary: Ozone: TestKeySpaceManager#testDeleteVolume fails
 Key: HDFS-11913
 URL: https://issues.apache.org/jira/browse/HDFS-11913
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-11774 introduces an UT failure, {{TestKeySpaceManager#testDeleteVolume}}, 
error as below

{noformat}
java.util.NoSuchElementException
 at 
org.fusesource.leveldbjni.internal.JniDBIterator.peekNext(JniDBIterator.java:84)
 at org.fusesource.leveldbjni.internal.JniDBIterator.next(JniDBIterator.java:98)
 at org.fusesource.leveldbjni.internal.JniDBIterator.next(JniDBIterator.java:45)
 at 
org.apache.hadoop.ozone.ksm.MetadataManagerImpl.isVolumeEmpty(MetadataManagerImpl.java:221)
 at 
org.apache.hadoop.ozone.ksm.VolumeManagerImpl.deleteVolume(VolumeManagerImpl.java:294)
 at 
org.apache.hadoop.ozone.ksm.KeySpaceManager.deleteVolume(KeySpaceManager.java:340)
 at 
org.apache.hadoop.ozone.protocolPB.KeySpaceManagerProtocolServerSideTranslatorPB.deleteVolume(KeySpaceManagerProtocolServerSideTranslatorPB.java:200)
 at 
org.apache.hadoop.ozone.protocol.proto.KeySpaceManagerProtocolProtos$KeySpaceManagerService$2.callBlockingMethod(KeySpaceManagerProtocolProtos.java:22742)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659)
{noformat}

this is caused by a buggy code in {{MetadataManagerImpl#isVolumeEmpty}}, there 
are 2 issues need to be fixed
# Iterate next element will throw this exception if it doesn't have next. This 
always fail when a volume is empty.
# The code was checking if the first bucket name start with "/volume_name", 
this will return a wrong value if I have several empty volumes with same 
prefix, e.g "/volA/", "/volAA/". Such case {{isVolumeEmpty}} will return false 
as the next element from "/volA/" is not a bucket, it's another volume 
"/volAA/" but matches the prefix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11740) Ozone: Differentiate time interval for different DatanodeStateMachine state tasks

2017-05-31 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-11740.

Resolution: Later

Will revisit this if necessary in feature.

> Ozone: Differentiate time interval for different DatanodeStateMachine state 
> tasks
> -
>
> Key: HDFS-11740
> URL: https://issues.apache.org/jira/browse/HDFS-11740
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-11740-HDFS-7240.001.patch, 
> HDFS-11740-HDFS-7240.002.patch, HDFS-11740-HDFS-7240.003.patch, 
> statemachine_1.png, statemachine_2.png
>
>
> Currently datanode state machine transitioned between tasks in a fixed time 
> interval, defined by {{ScmConfigKeys#OZONE_SCM_HEARTBEAT_INTERVAL_SECONDS}}, 
> the default value is 30s. Once datanode is started, it will need 90s before 
> transited to {{Heartbeat}} state, such a long lag is not necessary. Propose 
> to improve the logic of time interval handling, it seems only the heartbeat 
> task needs to be scheduled in {{OZONE_SCM_HEARTBEAT_INTERVAL_SECONDS}} 
> interval, rest should be done without any lagging.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11873) Ozone: Object store handler cannot serve requests from same http client

2017-05-23 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11873:
--

 Summary: Ozone: Object store handler cannot serve requests from 
same http client
 Key: HDFS-11873
 URL: https://issues.apache.org/jira/browse/HDFS-11873
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Critical


This issue was found when I worked on HDFS-11846. Instead of creating a new 
http client instance per request, I tried to reuse {{CloseableHttpClient}} in 
{{OzoneClient}} class in a {{PoolingHttpClientConnectionManager}}. However, 
every second request from the http client hangs, which could not get dispatched 
to {{ObjectStoreJerseyContainer}}. There seems to be something wrong in the 
netty pipeline, this jira aims to 1) fix the problem in the server side 2) use 
the pool for client http clients to reduce the resource overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11871) balance include Parameter Usage Error

2017-05-23 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-11871.

Resolution: Not A Problem

> balance include Parameter Usage Error
> -
>
> Key: HDFS-11871
> URL: https://issues.apache.org/jira/browse/HDFS-11871
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.7.3
>Reporter: kevy liu
>Assignee: Weiwei Yang
>Priority: Trivial
>
> [hadoop@bigdata-hdp-apache505 hadoop-2.7.2]$ bin/hdfs balancer -h
> Usage: hdfs balancer
> [-policy ]  the balancing policy: datanode or blockpool
> [-threshold ]Percentage of disk capacity
> [-exclude [-f  | ]]  
> Excludes the specified datanodes.
> [-include [-f  | ]]  
> Includes only the specified datanodes.
> [-idleiterations ]  Number of consecutive idle 
> iterations (-1 for Infinite) before exit.
> Parameter Description:
> -f  |  
> The parse separator in the code is:
> String[] nodes = line.split("[ \t\n\f\r]+");



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11846) Ozone: Potential http connection leaks in ozone clients

2017-05-17 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11846:
--

 Summary: Ozone: Potential http connection leaks in ozone clients
 Key: HDFS-11846
 URL: https://issues.apache.org/jira/browse/HDFS-11846
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


There are some http clients in {{OzoneVolume}}, {{OzoneBucket}} and 
{{OzoneClient}} are not properly closed which would leak resource leaks. This 
jira's purpose is to fix these issues and investigate if we can reuse some of 
http connections for better performance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11845) Ozone: Output error when DN handshakes with SCM

2017-05-17 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11845:
--

 Summary: Ozone: Output error when DN handshakes with SCM
 Key: HDFS-11845
 URL: https://issues.apache.org/jira/browse/HDFS-11845
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


When start SCM and DN, there is always an error in SCM log

{noformat}
17/05/17 15:19:59 WARN ipc.Server: IPC Server handler 9 on 9861, call Call#4 
Retry#0 
org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol.getVersion 
from 172.16.165.133:44824: output error
17/05/17 15:19:59 INFO ipc.Server: IPC Server handler 9 on 9861 caught an 
exception
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:3216)
at org.apache.hadoop.ipc.Server.access$1600(Server.java:135)
at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1463)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1533)
at 
org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2581)
at org.apache.hadoop.ipc.Server$Connection.access$300(Server.java:1605)
at org.apache.hadoop.ipc.Server$RpcCall.doResponse(Server.java:931)
at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:765)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11844) Ozone: Recover SCM state when SCM is restarted

2017-05-17 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11844:
--

 Summary: Ozone: Recover SCM state when SCM is restarted
 Key: HDFS-11844
 URL: https://issues.apache.org/jira/browse/HDFS-11844
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Reporter: Weiwei Yang
Assignee: Weiwei Yang


SCM losses its state once being restarted. A simple test can be done by 
following steps

# Start NN, DN, SCM
# Create several containers via SCM CLI
# Restart SCM
# Get existing container state via SCM CLI, this step will fail with container 
doesn't exist error.

{{ContainerManagerImpl}} maintains a cache of container mapping 
{{containerMap}}, if SCM is restarted, this information is lost. We need a way 
to restore the state from DB in a background thread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11830) Ozone: Datanode needs to re-register to SCM if SCM is restarted

2017-05-16 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11830:
--

 Summary: Ozone: Datanode needs to re-register to SCM if SCM is 
restarted
 Key: HDFS-11830
 URL: https://issues.apache.org/jira/browse/HDFS-11830
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Problem description:

# Start NN, DN, SCM
# Restart SCM and will see following warnings in SCM log
17/05/02 00:47:08 WARN node.SCMNodeManager: SCM receive heartbeat from 
unregistered datanode

Datanode could not re-establish communication with SCM afterwards. Propose to 
fix this by adding a new command in HB handling telling datanode to re-register 
with SCM. Datanode once received this command transits to REGISTER state again 
to proceed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11761) Ozone: Get container report should closed container reports

2017-05-06 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11761:
--

 Summary: Ozone: Get container report should closed container 
reports
 Key: HDFS-11761
 URL: https://issues.apache.org/jira/browse/HDFS-11761
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
 Attachments: HDFS-11761-HDFS-7240.001.patch

{{ContainerManagerImpl#getContainerReports}} should return only closed 
container reports. But there is seems to be a negligence to return open ones 
instead. We also need to add unit test for this operation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11725) Ozone: Revise create container CLI specification and implementation

2017-04-30 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11725:
--

 Summary: Ozone: Revise create container CLI specification and 
implementation
 Key: HDFS-11725
 URL: https://issues.apache.org/jira/browse/HDFS-11725
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Per [design 
doc|https://issues.apache.org/jira/secure/attachment/12861478/storage-container-manager-cli-v002.pdf]
 in HDFS-11470

{noformat}
hdfs scm -container create -p 

Notes : This command connects to SCM and creates a container. Once the 
container is created in the SCM, the corresponding container is created at the 
appropriate datanode. Optional -p allows the user to control which pipeline to 
use while creating this container, this is strictly for debugging and testing.
{noformat}

it has 2 problems with this design, 1st it does support a container name but it 
is quite useful for testing; 2nd it supports an optional option for pipeline, 
that is not quite necessary right now given SCM handles the creation of the 
pipelines, we might want to support this later. So proposed to revise the CLI to

{code}
hdfs scm -container create -c 
{code}

the {{-c}} option is *required*. Backend it does following steps
# Given the container name, ask SCM where the container should be replicated 
to. This returns a pipeline.
# Communicate with each datanode in the pipeline to create the container.

this jira is to track the work to update both the design doc as well as the 
implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11716) Revisit delete container API

2017-04-27 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11716:
--

 Summary: Revisit delete container API
 Key: HDFS-11716
 URL: https://issues.apache.org/jira/browse/HDFS-11716
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Current delete container API seems can be possibly running into inconsistent 
state. SCM maintains a mapping of container to nodes, datanode maintains the 
actual container's data. When deletes a container, we need to make sure db is 
removed as well as the mapping in SCM also gets updated. What if the datanode 
failed to remove stuff for a container, do we update the mapping? We need to 
revisit the implementation and get these issues addressed. See more discussion 
[here|https://issues.apache.org/jira/browse/HDFS-11675?focusedCommentId=15987798=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15987798]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11678) Ozone: SCM CLI: Implement get container metrics command

2017-04-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11678:
--

 Summary: Ozone: SCM CLI: Implement get container metrics command
 Key: HDFS-11678
 URL: https://issues.apache.org/jira/browse/HDFS-11678
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Weiwei Yang


Implement the command to get container metrics

{code}
hdfs scm -container metrics 
{code}

this command returns container metrics in certain format, e.g json.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11677) OZone: SCM CLI: Implement get container command

2017-04-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11677:
--

 Summary: OZone: SCM CLI: Implement get container command
 Key: HDFS-11677
 URL: https://issues.apache.org/jira/browse/HDFS-11677
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


Implement get container

{code}
hdfs scm -container get  -o 
{code}

This command works only against a closed container. If the container is closed, 
then SCM will return the address of the datanodes. The datanodes support an API 
called copyCon- tainer, which returns the container as a tar ball.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11676) Ozone: SCM CLI: Implement close container command

2017-04-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11676:
--

 Summary: Ozone: SCM CLI: Implement close container command
 Key: HDFS-11676
 URL: https://issues.apache.org/jira/browse/HDFS-11676
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


Implement delete container

{code}
hdfs scm -container close 
{code}

This command connects to SCM and closes a container. Once the container is 
closed in the SCM, the corresponding container is closed at the appropriate 
datanode. if the container does not exist, it will return an error.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11675) Ozone: SCM CLI: Implement delete container command

2017-04-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11675:
--

 Summary: Ozone: SCM CLI: Implement delete container command
 Key: HDFS-11675
 URL: https://issues.apache.org/jira/browse/HDFS-11675
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


Implement delete container

{code}
hdfs scm -container del  -f
{code}

Deletes a container if it is empty. The -f options can be used to force a 
delete of a non-empty container. If container name specified not exist, prints 
a clear error message.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11668) Ozone: misc improvements for SCM CLI

2017-04-18 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11668:
--

 Summary: Ozone: misc improvements for SCM CLI
 Key: HDFS-11668
 URL: https://issues.apache.org/jira/browse/HDFS-11668
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Once HDFS-11649 is done, there are some improvements need to be done in order 
to make SCM CLI better to use in a pseudo cluster, this includes

# HDFS-11649 adds java classes for CLIs, we will need to add shell code to 
expose these commands
# Better error messages when missing some key ozone configurations, e.g 
{{ozone.scm.names}}, {{ozone.scm.datanode.id}} ... etc
# Property {{ozone.enabled}} is not honored, don't know why yet
# Better logging. Currently {{DatanodeStateMachine}} prints very limited logs, 
adds some more logs to indicate state transition is necessary.

The ultimate goal of this ticket is to ensure SCM CLI can work nicely on a 
pseudo cluster.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11658) Ozone: SCM daemon is unable to be started via CLI

2017-04-17 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11658:
--

 Summary: Ozone: SCM daemon is unable to be started via CLI
 Key: HDFS-11658
 URL: https://issues.apache.org/jira/browse/HDFS-11658
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


SCM daemon can no longer be started via CLI since {{StorageContainerManager}} 
class package renamed from 
{{org.apache.hadoop.ozone.storage.StorageContainerManager}} to 
{{org.apache.hadoop.ozone.scm.StorageContainerManager}} after HDFS-11184.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11655) Ozone: CLI: Guarantees user runs ozone commands has appropriate permission

2017-04-13 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11655:
--

 Summary: Ozone: CLI: Guarantees user runs ozone commands has 
appropriate permission
 Key: HDFS-11655
 URL: https://issues.apache.org/jira/browse/HDFS-11655
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


We need to add a permission check module for ozone command line utilities, to 
make sure users run commands with proper privileges. For now, in commands in 
[design doc| 
https://issues.apache.org/jira/secure/attachment/12861478/storage-container-manager-cli-v002.pdf]
 all require admin privilege.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11625) Ozone: Replace hard coded datanode data dir in test code with getStorageDir to fix UT failures

2017-04-05 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11625:
--

 Summary: Ozone: Replace hard coded datanode data dir in test code 
with getStorageDir to fix UT failures
 Key: HDFS-11625
 URL: https://issues.apache.org/jira/browse/HDFS-11625
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


There seems to be some UT regressions after HDFS-11519, such as

* TestDataNodeVolumeFailureToleration
* TestDataNodeVolumeFailureReporting
* TestDiskBalancerCommand
* TestBlockStatsMXBean
* TestDataNodeVolumeMetrics
* TestDFSAdmin
* TestDataNodeHotSwapVolumes
* TestDataNodeVolumeFailure

these tests set up datanode data dir by some hard coded names, such as 
{code}
  new File(cluster.getDataDirectory(), "data1");
{code}

this no longer works since HDFS-11519 changes the pattern from

{code}
/data/data<2*dnIndex + 1>
/data/data<2*dnIndex + 2>
...
{code}

to 

{code}
/data/dn0_data0
/data/dn0_data1
/data/dn1_data0
/data/dn1_data1
...
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11585) Ozone: Support force update a container

2017-03-27 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11585:
--

 Summary: Ozone: Support force update a container
 Key: HDFS-11585
 URL: https://issues.apache.org/jira/browse/HDFS-11585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-11567 added support of updating a container, and in following cases

# Container is closed
# Container meta file is falsely removed on disk or corrupted

a container cannot be gracefully updated. It is useful to support forcibly 
update if a container gets into such state, that gives us the chance to repair 
meta data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11581) Ozone: Support force delete a container

2017-03-27 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11581:
--

 Summary: Ozone: Support force delete a container
 Key: HDFS-11581
 URL: https://issues.apache.org/jira/browse/HDFS-11581
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang


In some occasions, we may want to forcibly delete a container regardless of if 
deletion condition is satisfied, e.g container is empty. This way we can do 
best-effort to clean up containers. Note, only a CLOSED container can be force 
deleted. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11569) Ozone: Implement listKey function for KeyManager

2017-03-23 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11569:
--

 Summary: Ozone: Implement listKey function for KeyManager
 Key: HDFS-11569
 URL: https://issues.apache.org/jira/browse/HDFS-11569
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


List keys by prefix from a container. This doesn't need to support pagination 
as keys in a single container should be containable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11567) Support update container

2017-03-23 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-11567:
--

 Summary: Support update container
 Key: HDFS-11567
 URL: https://issues.apache.org/jira/browse/HDFS-11567
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Add support to update a container.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11413) HDFS fsck command shows health as corrupt for '/'

2017-02-16 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved HDFS-11413.

Resolution: Not A Bug

> HDFS fsck command shows health as corrupt for '/'
> -
>
> Key: HDFS-11413
> URL: https://issues.apache.org/jira/browse/HDFS-11413
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Nishant Verma
>
> I have open source hadoop version 2.7.3 cluster (2 Masters + 3 Slaves) 
> installed on AWS EC2 instances. I am using the cluster to integrate it with 
> Kafka Connect. 
> The setup of cluster was done last month and setup of kafka connect was 
> completed last fortnight. Since then, we were able to operate the kafka topic 
> records on our HDFS and do various operations.
> Since last afternoon, I find that any kafka topic is not getting committed to 
> the cluster. When I tried to open the older files, I started getting below 
> error. When I copy a new file to the cluster from local, it comes and gets 
> opened but after some time, again starts showing similar IOException:
> ==
> 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log
> 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: 
> java.io.IOException: No live nodes contain block 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking 
> nodes = [], ignoredNodes = null No live nodes contain current block Block 
> locations: Dead nodes: . Will get new block locations from namenode and 
> retry...
> 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 
> IOException, will wait for 499.3472970548959 msec.
> 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log
> 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: 
> java.io.IOException: No live nodes contain block 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking 
> nodes = [], ignoredNodes = null No live nodes contain current block Block 
> locations: Dead nodes: . Will get new block locations from namenode and 
> retry...
> 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 
> IOException, will wait for 4988.873277172643 msec.
> 17/02/14 07:58:00 INFO hdfs.DFSClient: No node available for 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log
> 17/02/14 07:58:00 INFO hdfs.DFSClient: Could not obtain 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: 
> java.io.IOException: No live nodes contain block 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking 
> nodes = [], ignoredNodes = null No live nodes contain current block Block 
> locations: Dead nodes: . Will get new block locations from namenode and 
> retry...
> 17/02/14 07:58:00 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 
> IOException, will wait for 8598.311122824263 msec.
> 17/02/14 07:58:09 WARN hdfs.DFSClient: Could not obtain block: 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log No live nodes contain current block Block 
> locations: Dead nodes: . Throwing a BlockMissingException
> 17/02/14 07:58:09 WARN hdfs.DFSClient: Could not obtain block: 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log No live nodes contain current block Block 
> locations: Dead nodes: . Throwing a BlockMissingException
> 17/02/14 07:58:09 WARN hdfs.DFSClient: DFS Read
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: 
> BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 
> file=/test/inputdata/derby.log
> at 
> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:983)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
> at 
> org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
> at 
> 

  1   2   >