[jira] [Reopened] (HDFS-12748) NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY
[ https://issues.apache.org/jira/browse/HDFS-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reopened HDFS-12748: > NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY > > > Key: HDFS-12748 > URL: https://issues.apache.org/jira/browse/HDFS-12748 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.2 >Reporter: Jiandan Yang >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.3.0, 3.2.1 > > Attachments: HDFS-12748-branch-3.1.01.patch, HDFS-12748.001.patch, > HDFS-12748.002.patch, HDFS-12748.003.patch, HDFS-12748.004.patch, > HDFS-12748.005.patch > > > In our production environment, the standby NN often do fullgc, through mat we > found the largest object is FileSystem$Cache, which contains 7,844,890 > DistributedFileSystem. > By view hierarchy of method FileSystem.get() , I found only > NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating > different DistributedFileSystem every time instead of get a FileSystem from > cache. > {code:java} > case GETHOMEDIRECTORY: { > final String js = JsonUtil.toJsonString("Path", > FileSystem.get(conf != null ? conf : new Configuration()) > .getHomeDirectory().toUri().getPath()); > return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); > } > {code} > When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc. > {code:java} > case GETHOMEDIRECTORY: { > FileSystem fs = null; > try { > fs = FileSystem.get(conf != null ? conf : new Configuration()); > final String js = JsonUtil.toJsonString("Path", > fs.getHomeDirectory().toUri().getPath()); > return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); > } finally { > if (fs != null) { > fs.close(); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13351) Revert HDFS-11156 from branch-2
Weiwei Yang created HDFS-13351: -- Summary: Revert HDFS-11156 from branch-2 Key: HDFS-13351 URL: https://issues.apache.org/jira/browse/HDFS-13351 Project: Hadoop HDFS Issue Type: Task Components: webhdfs Reporter: Weiwei Yang Assignee: Weiwei Yang Per discussion in HDFS-11156, lets revert the change from branch-2. New patch can be tracked in HDFS-12459 . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12936) java.lang.OutOfMemoryError: unable to create new native thread
[ https://issues.apache.org/jira/browse/HDFS-12936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-12936. Resolution: Not A Bug > java.lang.OutOfMemoryError: unable to create new native thread > -- > > Key: HDFS-12936 > URL: https://issues.apache.org/jira/browse/HDFS-12936 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 > Environment: CDH5.12 > hadoop2.6 >Reporter: Jepson > Original Estimate: 96h > Remaining Estimate: 96h > > I configure the max user processes 65535 with any user ,and the datanode > memory is 8G. > When a log of data was been writeen,the datanode was been shutdown. > But I can see the memory use only < 1000M. > Please to see https://pan.baidu.com/s/1o7BE0cy > *DataNode shutdown error log:* > {code:java} > 2017-12-17 23:58:14,422 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > PacketResponder: > BP-1437036909-192.168.17.36-1509097205664:blk_1074725940_987917, > type=HAS_DOWNSTREAM_IN_PIPELINE terminating > 2017-12-17 23:58:31,425 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. > Will retry in 30 seconds. > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:714) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154) > at java.lang.Thread.run(Thread.java:745) > 2017-12-17 23:59:01,426 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. > Will retry in 30 seconds. > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:714) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154) > at java.lang.Thread.run(Thread.java:745) > 2017-12-17 23:59:05,520 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. > Will retry in 30 seconds. > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:714) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154) > at java.lang.Thread.run(Thread.java:745) > 2017-12-17 23:59:31,429 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving BP-1437036909-192.168.17.36-1509097205664:blk_1074725951_987928 > src: /192.168.17.54:40478 dest: /192.168.17.48:50010 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12770) Add doc about how to disable client socket cache
Weiwei Yang created HDFS-12770: -- Summary: Add doc about how to disable client socket cache Key: HDFS-12770 URL: https://issues.apache.org/jira/browse/HDFS-12770 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Minor After HDFS-3365, client socket cache (PeerCache) can be disabled, but there is no doc about this. We should add some doc in hdfs-default.xml to instruct user how to disable it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12757) DeadLock Happened Between DFSOutputStream and LeaseRenewer when LeaseRenewer#renew SocketTimeException
[ https://issues.apache.org/jira/browse/HDFS-12757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-12757. Resolution: Duplicate > DeadLock Happened Between DFSOutputStream and LeaseRenewer when > LeaseRenewer#renew SocketTimeException > -- > > Key: HDFS-12757 > URL: https://issues.apache.org/jira/browse/HDFS-12757 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Jiandan Yang >Priority: Major > Attachments: HDFS-12757.patch > > > Java stack is : > {code:java} > Found one Java-level deadlock: > = > "Topology-2 (735/2000)": > waiting to lock monitor 0x7fff4523e6e8 (object 0x0005d3521078, a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer), > which is held by "LeaseRenewer:admin@na61storage" > "LeaseRenewer:admin@na61storage": > waiting to lock monitor 0x7fff5d41e838 (object 0x0005ec0dfa88, a > org.apache.hadoop.hdfs.DFSOutputStream), > which is held by "Topology-2 (735/2000)" > Java stack information for the threads listed above: > === > "Topology-2 (735/2000)": > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.addClient(LeaseRenewer.java:227) > - waiting to lock <0x0005d3521078> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.getInstance(LeaseRenewer.java:86) > at > org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:467) > at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:479) > at > org.apache.hadoop.hdfs.DFSOutputStream.setClosed(DFSOutputStream.java:776) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeThreads(DFSOutputStream.java:791) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:848) > - locked <0x0005ec0dfa88> (a > org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:805) > - locked <0x0005ec0dfa88> (a > org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > .. > "LeaseRenewer:admin@na61storage": > at > org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:750) > - waiting to lock <0x0005ec0dfa88> (a > org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:586) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:453) > - locked <0x0005d3521078> (a > org.apache.hadoop.hdfs.client.impl.LeaseRenewer) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$700(LeaseRenewer.java:76) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:310) > at java.lang.Thread.run(Thread.java:834) > Found 1 deadlock. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12744) More logs when short-circuit read is failed and disabled
Weiwei Yang created HDFS-12744: -- Summary: More logs when short-circuit read is failed and disabled Key: HDFS-12744 URL: https://issues.apache.org/jira/browse/HDFS-12744 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Weiwei Yang Assignee: Weiwei Yang Short-circuit read (SCR) failed with following error {noformat} 2017-10-21 16:42:28,024 WARN [B.defaultRpcServer.handler=7,queue=7,port=16020] impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR while attempting to set up short-circuit access. Block xxx is not valid {noformat} then short-circuit read is disabled for *10 minutes* without any warning message given in the log. This causes us spent some more time to figure out why we had a long time window that SCR was not working. Propose to add a warning log (other places already did) to indicate SCR is disabled and some more logging in DN to display what happened. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12701) More fine-grained locks in ShortCircuitCache
Weiwei Yang created HDFS-12701: -- Summary: More fine-grained locks in ShortCircuitCache Key: HDFS-12701 URL: https://issues.apache.org/jira/browse/HDFS-12701 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.8.1 Reporter: Weiwei Yang When cluster is heavily loaded, we found HBase regionserver handlers are often blocked by {{ShortCircuitCache}}. Dumped jstack and found more lots of thread waiting on obtain the cache lock. It should be able to be improved by using more fine-grained locks to improve the performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics
Weiwei Yang created HDFS-12684: -- Summary: Ozone: SCM metrics NodeCount is overlapping with node manager metrics Key: HDFS-12684 URL: https://issues.apache.org/jira/browse/HDFS-12684 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, scm Reporter: Weiwei Yang Priority: Minor I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, both SCM and SCMNodeManager has {{NodeCount}} metrics {noformat} { "name" : "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime", "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager", "ClientRpcPort" : "9860", "DatanodeRpcPort" : "9861", "NodeCount" : [ { "key" : "STALE", "value" : 0 }, { "key" : "DECOMMISSIONING", "value" : 0 }, { "key" : "DECOMMISSIONED", "value" : 0 }, { "key" : "FREE_NODE", "value" : 0 }, { "key" : "RAFT_MEMBER", "value" : 0 }, { "key" : "HEALTHY", "value" : 0 }, { "key" : "DEAD", "value" : 0 }, { "key" : "UNKNOWN", "value" : 0 } ], "CompileInfo" : "2017-10-17T06:47Z xxx", "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461", "SoftwareVersion" : "3.1.0-SNAPSHOT", "StartedTimeInMillis" : 1508393551065 }, { "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo", "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager", "NodeCount" : [ { "key" : "STALE", "value" : 0 }, { "key" : "DECOMMISSIONING", "value" : 0 }, { "key" : "DECOMMISSIONED", "value" : 0 }, { "key" : "FREE_NODE", "value" : 0 }, { "key" : "RAFT_MEMBER", "value" : 0 }, { "key" : "HEALTHY", "value" : 0 }, { "key" : "DEAD", "value" : 0 }, { "key" : "UNKNOWN", "value" : 0 } ], "OutOfChillMode" : false, "MinimumChillModeNodes" : 1, "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 0 nodes reported, minimal 1 nodes required." } {noformat} hence, propose to remove {{NodeCount}} from {{SCMMXBean}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12401) Ozone: TestBlockDeletingService#testBlockDeletionTimeout sometimes timeout
[ https://issues.apache.org/jira/browse/HDFS-12401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-12401. Resolution: Cannot Reproduce > Ozone: TestBlockDeletingService#testBlockDeletionTimeout sometimes timeout > -- > > Key: HDFS-12401 > URL: https://issues.apache.org/jira/browse/HDFS-12401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: HDFS-7240 >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Weiwei Yang >Priority: Minor > > {code} > testBlockDeletionTimeout(org.apache.hadoop.ozone.container.common.TestBlockDeletingService) > Time elapsed: 100.383 sec <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for condition. > Thread diagnostics: > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12546) Ozone: DB listing operation performance improvement
Weiwei Yang created HDFS-12546: -- Summary: Ozone: DB listing operation performance improvement Key: HDFS-12546 URL: https://issues.apache.org/jira/browse/HDFS-12546 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang While investigating HDFS-12506, I found there are several {{getRangeKVs}} can be replaced by {{getSequentialRangeKVs}} to improve the performance. This JIRA is to track these improvements with sufficient tests. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12540) Ozone: node status text reported by SCM is a confusing
Weiwei Yang created HDFS-12540: -- Summary: Ozone: node status text reported by SCM is a confusing Key: HDFS-12540 URL: https://issues.apache.org/jira/browse/HDFS-12540 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Priority: Trivial At present SCM UI displays node status like following {noformat} Node Manager: Chill mode status: Out of chill mode. 15 of out of total 1 nodes have reported in. {noformat} this text is a bit confusing. UI retrieves status from {{SCMNodeManager#getNodeStatus}}, related call is {{#getChillModeStatus}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12539) Ozone: refactor some functions in KSMMetadataManagerImpl to be more readable and reusable
Weiwei Yang created HDFS-12539: -- Summary: Ozone: refactor some functions in KSMMetadataManagerImpl to be more readable and reusable Key: HDFS-12539 URL: https://issues.apache.org/jira/browse/HDFS-12539 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Priority: Minor This is from [~anu]'s review comment in HDFS-12506, [https://issues.apache.org/jira/browse/HDFS-12506?focusedCommentId=16178356=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16178356]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-12415) Ozone: TestXceiverClientManager and TestAllocateContainer occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reopened HDFS-12415: > Ozone: TestXceiverClientManager and TestAllocateContainer occasionally fails > > > Key: HDFS-12415 > URL: https://issues.apache.org/jira/browse/HDFS-12415 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12415-HDFS-7240.001.patch, > HDFS-12415-HDFS-7240.002.patch, HDFS-12415-HDFS-7240.003.patch > > > TestXceiverClientManager seems to be occasionally failing in some jenkins > jobs, > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.ozone.scm.node.SCMNodeManager.getNodeStat(SCMNodeManager.java:828) > at > org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.hasEnoughSpace(SCMCommonPolicy.java:147) > at > org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.lambda$chooseDatanodes$0(SCMCommonPolicy.java:125) > {noformat} > see more from [this > report|https://builds.apache.org/job/PreCommit-HDFS-Build/21065/testReport/] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12524) Ozone: Record number of keys scanned and hinted for getRangeKVs call
Weiwei Yang created HDFS-12524: -- Summary: Ozone: Record number of keys scanned and hinted for getRangeKVs call Key: HDFS-12524 URL: https://issues.apache.org/jira/browse/HDFS-12524 Project: Hadoop HDFS Issue Type: Sub-task Components: logging, ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Minor Add debug logging to record number of keys scanned and hinted for {{getRangeKVs}} calls, this will be helpful to debug performance issues since {{getRangeKVs}} is often the place where we get the lag. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12506) Ozone: ListBucket is too slow
Weiwei Yang created HDFS-12506: -- Summary: Ozone: ListBucket is too slow Key: HDFS-12506 URL: https://issues.apache.org/jira/browse/HDFS-12506 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Blocker Generated 3 million keys in ozone, and run {{listBucket}} command to get a list of buckets under a volume, {code} bin/hdfs oz -listBucket http://15oz1.fyre.ibm.com:9864/vol-0-15143 -user wwei {code} this call spent over *15 seconds* to finish. The problem was caused by the inflexible structure of KSM DB. Right now {{ksm.db}} stores keys like following {code} /v1/b1 /v1/b1/k1 /v1/b1/k2 /v1/b1/k3 /v1/b2 /v1/b2/k1 /v1/b2/k2 /v1/b2/k3 /v1/b3 /v1/b4 {code} keys are sorted in nature order so when we do list buckets under a volume e.g /v1, we need to seek to /v1 point and start to iterate and filter keys, this ends up with scanning all keys under volume /v1. The problem with this design is we don't have an efficient approach to locate all buckets without scanning the keys. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12504) Ozone: Improve SQLCLI performance
Weiwei Yang created HDFS-12504: -- Summary: Ozone: Improve SQLCLI performance Key: HDFS-12504 URL: https://issues.apache.org/jira/browse/HDFS-12504 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang In my test, my {{ksm.db}} has *3017660* entries with total size of *128mb*, SQLCLI tool runs over *2 hours* but still not finish exporting the DB. This is because it iterates each entry and inserts that to another sqllite DB file, which is not efficient. We need to improve this to be running more efficiently on large DB files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12503) Ozone: some UX improvements to oz_debug
Weiwei Yang created HDFS-12503: -- Summary: Ozone: some UX improvements to oz_debug Key: HDFS-12503 URL: https://issues.apache.org/jira/browse/HDFS-12503 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang I tried to use {{oz_debug}} to dump KSM DB for offline analysis, found a few problems need to be fixed in order to make this tool easier to use. I know this is a debug tool for admins, but it's still necessary to improve the UX so new users (like me) can figure out how to use it without reading more docs. # Support *--help* argument. --help is the general arg for all hdfs scripts to print usage. # When specify output path {{-o}}, we need to add a description to let user know the path needs to be a file (instead of a dir). If the path is specified as a dir, it will end up with a funny error {{unable to open the database file (out of memory)}}, which is pretty misleading. And it will be helpful to add a check to make sure the specified path is not an existing dir. # SQLCLI currently swallows exception # We should remove {{levelDB}} words from the command output as we are by default using rocksDB -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12500) Ozone: add logger for oz shell commands and move error stack traces to DEBUG level
Weiwei Yang created HDFS-12500: -- Summary: Ozone: add logger for oz shell commands and move error stack traces to DEBUG level Key: HDFS-12500 URL: https://issues.apache.org/jira/browse/HDFS-12500 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Priority: Minor Per discussion in HDFS-12489 to reduce the verbosity of logs when exception happens, lets add logger to {{Shell.java}} and move error stack traces to DEBUG level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12492) Ozone: ListVolume output misses some attributes
Weiwei Yang created HDFS-12492: -- Summary: Ozone: ListVolume output misses some attributes Key: HDFS-12492 URL: https://issues.apache.org/jira/browse/HDFS-12492 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang When do a listVolume call, we get output like following {noformat} [ { "owner" : { "name" : "wwei" }, "quota" : { "unit" : "TB", "size" : 1048576 }, "volumeName" : "vol-0-84022", "createdOn" : "Mon, 18 Sep 2017 03:09:46 GMT", "createdBy" : null, "bytesUsed" : 0, "bucketCount" : 0 {noformat} Values for *createdOn*, *createdBy* and *bytesUsed* and *bucketCount* are all missing. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12489) Ozone: OzoneRestClientException swallows exceptions which makes client hard to debug failures
Weiwei Yang created HDFS-12489: -- Summary: Ozone: OzoneRestClientException swallows exceptions which makes client hard to debug failures Key: HDFS-12489 URL: https://issues.apache.org/jira/browse/HDFS-12489 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang There are multiple try-catch places swallow exceptions when transforming some other exception to {{OzoneRestClientException}}. As a result, when client runs into such code paths, they lose track of what was going on which makes the debug extremely difficult. See below example {code} bin/hdfs oz -listBucket http://15oz1.fyre.ibm.com:9864/vol-0-84022 -user wwei Command Failed : {"httpCode":0,"shortMessage":"Read timed out","resource":null,"message":"Read timed out","requestID":null,"hostName":null} {code} the returned message doesn't help much on debugging where and how it reads timed out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12488) Ozone: OzoneRestClient has no notion of configuration
Weiwei Yang created HDFS-12488: -- Summary: Ozone: OzoneRestClient has no notion of configuration Key: HDFS-12488 URL: https://issues.apache.org/jira/browse/HDFS-12488 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang When I test ozone on a 15 nodes cluster with millions of keys, responses of rest client becomes to be slower. Following call times out after default 5s, {code} bin/hdfs oz -listBucket http://15oz1.fyre.ibm.com:9864/vol-0-84022 -user wwei Command Failed : {"httpCode":0,"shortMessage":"Read timed out","resource":null,"message":"Read timed out","requestID":null,"hostName":null} {code} Then I increase the timeout by explicitly setting following property in {{ozone-site.xml}} {code} ozone.client.socket.timeout.ms 1 {code} but this doesn't work and rest clients are still created with default *5s* timeout. This needs to be fixed. Just like {{DFSClient}}, we should make {{OzoneRestClient}} to be configuration awareness, so that clients can adjust client configuration on demand. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12477) Ozone: Some minor text improvement in SCM web UI
Weiwei Yang created HDFS-12477: -- Summary: Ozone: Some minor text improvement in SCM web UI Key: HDFS-12477 URL: https://issues.apache.org/jira/browse/HDFS-12477 Project: Hadoop HDFS Issue Type: Sub-task Components: scm, ui Reporter: Weiwei Yang Priority: Trivial While trying out SCM UI, there seems to have some small text problems, bq. Node Manager: Minimum chill mode nodes) It has an extra ). bq. $$hashKey object:9 I am not really sure what does this mean? Would this help? bq. Node counts Can we place the HEALTHY ones at the top of the table? bq. Node Manager: Chill mode status: Out of chill mode. 15 of out of total 1 nodes have reported in. Can we refine this text a bit? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12461) Ozone: Ozone data placement is not even
[ https://issues.apache.org/jira/browse/HDFS-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-12461. Resolution: Not A Problem Assignee: Weiwei Yang Fix Version/s: HDFS-7240 > Ozone: Ozone data placement is not even > --- > > Key: HDFS-12461 > URL: https://issues.apache.org/jira/browse/HDFS-12461 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Anu Engineer >Assignee: Weiwei Yang >Priority: Blocker > Labels: ozoneMerge > Fix For: HDFS-7240 > > > On a machine with 3 data disks, Ozone keeps on picking the same disk to place > all containers. Looks like we have a bug in the round robin selection of > disks. > Steps to Reproduce: > 1. Install an Ozone cluster. > 2. Make sure that datanodes have more than one disk. > 3. Run corona few times, each run creates more containers. > 4. Login into the data node. > 5. Run a command like tree or ls -R /data or independently verify each > location. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12463) Ozone: Fix TestXceiverClientMetrics#testMetrics
Weiwei Yang created HDFS-12463: -- Summary: Ozone: Fix TestXceiverClientMetrics#testMetrics Key: HDFS-12463 URL: https://issues.apache.org/jira/browse/HDFS-12463 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang Priority: Minor {{TestXceiverClientMetrics#testMetrics}} is failing with following error in recent jenkins job, {noformat} java.util.ConcurrentModificationException: null at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.hadoop.ozone.scm.TestXceiverClientMetrics.lambda$testMetrics$2(TestXceiverClientMetrics.java:134) {noformat} looks like a non thread safe list caused this race condition in the test case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12459) Fix revert: Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
Weiwei Yang created HDFS-12459: -- Summary: Fix revert: Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API Key: HDFS-12459 URL: https://issues.apache.org/jira/browse/HDFS-12459 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Weiwei Yang Assignee: Weiwei Yang HDFS-11156 was reverted because the implementation was non optimal, based on the suggestion from [~shahrs87], we should avoid creating a dfs client to get block locations because that create extra RPC call. Instead we should use {{NamenodeProtocols#getBlockLocations}} then covert {{LocatedBlocks}} to {{BlockLocation[]}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reopened HDFS-11156: > Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API > > > Key: HDFS-11156 > URL: https://issues.apache.org/jira/browse/HDFS-11156 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.7.3 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Fix For: 3.0.0-alpha2 > > Attachments: BlockLocationProperties_JSON_Schema.jpg, > BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, > HDFS-11156.01.patch, HDFS-11156.02.patch, HDFS-11156.03.patch, > HDFS-11156.04.patch, HDFS-11156.05.patch, HDFS-11156.06.patch, > HDFS-11156.07.patch, HDFS-11156.08.patch, HDFS-11156.09.patch, > HDFS-11156.10.patch, HDFS-11156.11.patch, HDFS-11156.12.patch, > HDFS-11156.13.patch, HDFS-11156.14.patch, HDFS-11156.15.patch, > HDFS-11156.16.patch, HDFS-11156-branch-2.01.patch, > Output_JSON_format_v10.jpg, SampleResponse_JSON.jpg > > > Following webhdfs REST API > {code} > http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS=0=1 > {code} > will get a response like > {code} > { > "LocatedBlocks" : { > "fileLength" : 1073741824, > "isLastBlockComplete" : true, > "isUnderConstruction" : false, > "lastLocatedBlock" : { ... }, > "locatedBlocks" : [ {...} ] > } > } > {code} > This represents for *o.a.h.h.p.LocatedBlocks*. However according to > *FileSystem* API, > {code} > public BlockLocation[] getFileBlockLocations(Path p, long start, long len) > {code} > clients would expect an array of BlockLocation. This mismatch should be > fixed. Marked as Incompatible change as this will change the output of the > GET_BLOCK_LOCATIONS API. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12443) Ozone: Improve SCM block deletion throttling algorithm
Weiwei Yang created HDFS-12443: -- Summary: Ozone: Improve SCM block deletion throttling algorithm Key: HDFS-12443 URL: https://issues.apache.org/jira/browse/HDFS-12443 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, scm Reporter: Weiwei Yang Assignee: Weiwei Yang Currently SCM scans delLog to send deletion transactions to datanode periodically, the throttling algorithm is simple, it scans at most {{BLOCK_DELETE_TX_PER_REQUEST_LIMIT}} (by default 50) at a time. This is non-optimal, worst case it might cache 50 TXs for 50 different DNs so each DN will only get 1 TX to proceed in an interval, this will make the deletion slow. An improvement to this is to make this throttling by datanode, e.g 50 TXs per datanode per interval. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12440) Ozone: TestAllocateContainer fails on jenkins
[ https://issues.apache.org/jira/browse/HDFS-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-12440. Resolution: Duplicate Fix Version/s: HDFS-7240 > Ozone: TestAllocateContainer fails on jenkins > - > > Key: HDFS-12440 > URL: https://issues.apache.org/jira/browse/HDFS-12440 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Fix For: HDFS-7240 > > > I am seeing this failure in [this jenkins > report|https://builds.apache.org/job/PreCommit-HDFS-Build/21089/testReport/org.apache.hadoop.ozone.scm/TestAllocateContainer/testAllocate/], > with following error > {noformat} > Stacktrace > java.lang.NullPointerException > at > org.apache.hadoop.ozone.scm.node.SCMNodeManager.getNodeStat(SCMNodeManager.java:828) > at > org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.hasEnoughSpace(SCMCommonPolicy.java:147) > at > org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.lambda$chooseDatanodes$0(SCMCommonPolicy.java:125) > at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12440) Ozone: TestAllocateContainer fails on jenkins
Weiwei Yang created HDFS-12440: -- Summary: Ozone: TestAllocateContainer fails on jenkins Key: HDFS-12440 URL: https://issues.apache.org/jira/browse/HDFS-12440 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Minor I am seeing this failure in [this jenkins report|https://builds.apache.org/job/PreCommit-HDFS-Build/21067/testReport/org.apache.hadoop.ozone.scm/TestAllocateContainer/org_apache_hadoop_ozone_scm_TestAllocateContainer/], with following error {noformat} Stacktrace java.lang.NullPointerException: null at org.apache.hadoop.hdfs.MiniDFSCluster.setDataNodeStorageCapacities(MiniDFSCluster.java:1715) at org.apache.hadoop.hdfs.MiniDFSCluster.setDataNodeStorageCapacities(MiniDFSCluster.java:1694) at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1674) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:882) at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:494) at org.apache.hadoop.ozone.MiniOzoneCluster.(MiniOzoneCluster.java:98) at org.apache.hadoop.ozone.MiniOzoneCluster.(MiniOzoneCluster.java:77) at org.apache.hadoop.ozone.MiniOzoneCluster$Builder.build(MiniOzoneCluster.java:441) at org.apache.hadoop.ozone.scm.TestAllocateContainer.init(TestAllocateContainer.java:56) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12415) Ozone: TestXceiverClientManager occasionally fails
Weiwei Yang created HDFS-12415: -- Summary: Ozone: TestXceiverClientManager occasionally fails Key: HDFS-12415 URL: https://issues.apache.org/jira/browse/HDFS-12415 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Minor TestXceiverClientManager seems to be occasionally failing in some jenkins jobs, {noformat} java.lang.NullPointerException at org.apache.hadoop.ozone.scm.node.SCMNodeManager.getNodeStat(SCMNodeManager.java:828) at org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.hasEnoughSpace(SCMCommonPolicy.java:147) at org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.lambda$chooseDatanodes$0(SCMCommonPolicy.java:125) {noformat} see more from [this report|https://builds.apache.org/job/PreCommit-HDFS-Build/21065/testReport/] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12391) Ozone: TestKSMSQLCli is not working as expected
Weiwei Yang created HDFS-12391: -- Summary: Ozone: TestKSMSQLCli is not working as expected Key: HDFS-12391 URL: https://issues.apache.org/jira/browse/HDFS-12391 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, test Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Minor I found this issue while investigating the {{TestKSMSQLCli}} failure in [this jenkins report|https://builds.apache.org/job/PreCommit-HDFS-Build/20984/testReport/], the test is supposed to use parameterized class to test both {{LevelDB}} and {{RocksDB}} implementation of metadata stores, however it only tests default {{RocksDB}} case twice. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12367) Ozone: Too many open files error while running corona
[ https://issues.apache.org/jira/browse/HDFS-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-12367. Resolution: Duplicate I think this issue no longer happens to me, closing it as a dup to HDFS-12382 as this looks like to be fixed there, thanks [~nandakumar131]. [~msingh] feel free to create another lower severity JIRA to track resource leaks you found in code level. I will close this one as it is no longer a blocker for tests. > Ozone: Too many open files error while running corona > - > > Key: HDFS-12367 > URL: https://issues.apache.org/jira/browse/HDFS-12367 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, tools >Reporter: Weiwei Yang >Assignee: Mukul Kumar Singh > > Too many open files error keeps happening to me while using corona, I have > simply setup a single node cluster and run corona to generate 1000 keys, but > I keep getting following error > {noformat} > ./bin/hdfs corona -numOfThreads 1 -numOfVolumes 1 -numOfBuckets 1 -numOfKeys > 1000 > 17/08/28 00:47:42 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 17/08/28 00:47:42 INFO tools.Corona: Number of Threads: 1 > 17/08/28 00:47:42 INFO tools.Corona: Mode: offline > 17/08/28 00:47:42 INFO tools.Corona: Number of Volumes: 1. > 17/08/28 00:47:42 INFO tools.Corona: Number of Buckets per Volume: 1. > 17/08/28 00:47:42 INFO tools.Corona: Number of Keys per Bucket: 1000. > 17/08/28 00:47:42 INFO rpc.OzoneRpcClient: Creating Volume: vol-0-05000, with > wwei as owner and quota set to 1152921504606846976 bytes. > 17/08/28 00:47:42 INFO tools.Corona: Starting progress bar Thread. > ... > ERROR tools.Corona: Exception while adding key: key-251-19293 in bucket: > bucket-0-34960 of volume: vol-0-05000. > java.io.IOException: Exception getting XceiverClient. > at > org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:156) > at > org.apache.hadoop.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:122) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.getFromKsmKeyInfo(ChunkGroupOutputStream.java:289) > at > org.apache.hadoop.ozone.client.rpc.OzoneRpcClient.createKey(OzoneRpcClient.java:487) > at > org.apache.hadoop.ozone.tools.Corona$OfflineProcessor.run(Corona.java:352) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalStateException: failed to create a child event loop > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234) > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > at > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) > at > org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:144) > ... 9 more > Caused by: java.lang.IllegalStateException: failed to create a child event > loop > at > io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:68) > at > io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49) > at > io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:61) > at > io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:52) > at > io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:44) > at > io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:36) > at org.apache.hadoop.scm.XceiverClient.connect(XceiverClient.java:76) > at > org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:151) > at > org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:145) > at > com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767) > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) > ... 12 more > Caused by: io.netty.channel.ChannelException: failed to open a new selector > at
[jira] [Created] (HDFS-12389) Ozone: oz commandline list calls should return valid JSON format output
Weiwei Yang created HDFS-12389: -- Summary: Ozone: oz commandline list calls should return valid JSON format output Key: HDFS-12389 URL: https://issues.apache.org/jira/browse/HDFS-12389 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang At present the outputs of {{listVolume}}, {{listBucket}} and {{listKey}} are hard to parse, for example following call {code} ./bin/hdfs oz -listVolume http://localhost:9864 -user wwei {code} lists all volumes in my cluster and it returns {noformat} { "version" : 0, "md5hash" : null, "createdOn" : "Mon, 04 Sep 2017 03:25:22 GMT", "modifiedOn" : "Mon, 04 Sep 2017 03:25:22 GMT", "size" : 10240, "keyName" : "key-0-22381", "dataFileName" : null } { "version" : 0, "md5hash" : null, "createdOn" : "Mon, 04 Sep 2017 03:25:22 GMT", "modifiedOn" : "Mon, 04 Sep 2017 03:25:22 GMT", "size" : 10240, "keyName" : "key-0-22381", "dataFileName" : null } ... {noformat} this is not a valid JSON format output hence it is hard to parse in clients' script for further interactions. Propose to reformat them to valid JSON data. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12367) Ozone: Too many open files error while running corona
Weiwei Yang created HDFS-12367: -- Summary: Ozone: Too many open files error while running corona Key: HDFS-12367 URL: https://issues.apache.org/jira/browse/HDFS-12367 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, tools Reporter: Weiwei Yang Too many open files error keeps happening to me while using corona, I have simply setup a single node cluster and run corona to generate 1000 keys, but I keep getting following error {noformat} ./bin/hdfs corona -numOfThreads 1 -numOfVolumes 1 -numOfBuckets 1 -numOfKeys 1000 17/08/28 00:47:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/08/28 00:47:42 INFO tools.Corona: Number of Threads: 1 17/08/28 00:47:42 INFO tools.Corona: Mode: offline 17/08/28 00:47:42 INFO tools.Corona: Number of Volumes: 1. 17/08/28 00:47:42 INFO tools.Corona: Number of Buckets per Volume: 1. 17/08/28 00:47:42 INFO tools.Corona: Number of Keys per Bucket: 1000. 17/08/28 00:47:42 INFO rpc.OzoneRpcClient: Creating Volume: vol-0-05000, with wwei as owner and quota set to 1152921504606846976 bytes. 17/08/28 00:47:42 INFO tools.Corona: Starting progress bar Thread. ... ERROR tools.Corona: Exception while adding key: key-251-19293 in bucket: bucket-0-34960 of volume: vol-0-05000. java.io.IOException: Exception getting XceiverClient. at org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:156) at org.apache.hadoop.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:122) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.getFromKsmKeyInfo(ChunkGroupOutputStream.java:289) at org.apache.hadoop.ozone.client.rpc.OzoneRpcClient.createKey(OzoneRpcClient.java:487) at org.apache.hadoop.ozone.tools.Corona$OfflineProcessor.run(Corona.java:352) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: failed to create a child event loop at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234) at com.google.common.cache.LocalCache.get(LocalCache.java:3965) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) at org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:144) ... 9 more Caused by: java.lang.IllegalStateException: failed to create a child event loop at io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:68) at io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49) at io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:61) at io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:52) at io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:44) at io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:36) at org.apache.hadoop.scm.XceiverClient.connect(XceiverClient.java:76) at org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:151) at org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:145) at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) ... 12 more Caused by: io.netty.channel.ChannelException: failed to open a new selector at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:128) at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:120) at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87) at io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64) ... 25 more Caused by: java.io.IOException: Too many open files at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:130) at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:69) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
[jira] [Created] (HDFS-12366) Ozone: Refactor KSM metadata class names to avoid confusion
Weiwei Yang created HDFS-12366: -- Summary: Ozone: Refactor KSM metadata class names to avoid confusion Key: HDFS-12366 URL: https://issues.apache.org/jira/browse/HDFS-12366 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Trivial Propose to rename 2 classes in package {{org.apache.hadoop.ozone.ksm}} * MetadataManager -> KsmMetadataManager * MetadataManagerImpl -> KsmMetadataManagerImpl this is to avoid confusions with ozone metadata store classes, such as {{MetadataKeyFilters}}, {{MetadataStore}} and {{MetadataStoreBuilder}} etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12365) Ozone: ListVolume displays incorrect createdOn time when the volume was created by OzoneRpcClient
Weiwei Yang created HDFS-12365: -- Summary: Ozone: ListVolume displays incorrect createdOn time when the volume was created by OzoneRpcClient Key: HDFS-12365 URL: https://issues.apache.org/jira/browse/HDFS-12365 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang Reproduce steps 1. Create a key in ozone with corona (this delegates the call to OzoneRpcClient), e.g {code} [wwei@ozone1 hadoop-3.0.0-beta1-SNAPSHOT]$ ./bin/hdfs corona -numOfThreads 1 -numOfVolumes 1 -numOfBuckets 1 -numOfKeys 1 {code} 2. Run listVolume {code} [wwei@ozone1 hadoop-3.0.0-beta1-SNAPSHOT]$ ./bin/hdfs oz -listVolume http://localhost:9864 -user wwei { "owner" : { "name" : "wwei" }, "quota" : { "unit" : "TB", "size" : 1048576 }, "volumeName" : "vol-0-31437", "createdOn" : "Thu, 01 Jan 1970 00:00:00 GMT", "createdBy" : null } { "owner" : { "name" : "wwei" }, "quota" : { "unit" : "TB", "size" : 1048576 }, "volumeName" : "vol-0-38900", "createdOn" : "Thu, 01 Jan 1970 00:00:00 GMT", "createdBy" : null } {code} Note, the time displayed in {{createdOn}} are both incorrect {{Thu, 01 Jan 1970 00:00:00 GMT}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12362) Ozone: write deleted block to RAFT log for consensus on datanodes
Weiwei Yang created HDFS-12362: -- Summary: Ozone: write deleted block to RAFT log for consensus on datanodes Key: HDFS-12362 URL: https://issues.apache.org/jira/browse/HDFS-12362 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang Per discussion in HDFS-12282, we need to write deleted blocks info to RAFT log when that is ready, see more from [comment from Anu | https://issues.apache.org/jira/browse/HDFS-12282?focusedCommentId=16136022=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16136022]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12361) Ozone: SCM failed to start when a container metadata is empty
Weiwei Yang created HDFS-12361: -- Summary: Ozone: SCM failed to start when a container metadata is empty Key: HDFS-12361 URL: https://issues.apache.org/jira/browse/HDFS-12361 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, scm Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang When I run tests to create keys via corona, sometimes it left some containers with empty metadata. This might also happen when SCM stopped at some point that metadata was not yet written. When this happens, we got following error and SCM could not be started {noformat} 17/08/27 20:10:57 WARN datanode.DataNode: Unexpected exception in block pool Block pool BP-821804790-172.16.165.133-1503887277256 (Datanode Uuid 7ee16a59-9604-406e-a0f8-6f44650a725b) service to ozone1.fyre.ibm.com/172.16.165.133:8111 java.lang.NullPointerException at org.apache.hadoop.ozone.container.common.helpers.ContainerData.getFromProtBuf(ContainerData.java:66) at org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainerInfo(ContainerManagerImpl.java:210) at org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:158) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:99) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:77) at org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1592) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:409) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:783) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:286) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816) at java.lang.Thread.run(Thread.java:745) {noformat} We should add a NPE check and mark such containers as inactive without failing the SCM. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12354) Improve the throttle algorithm in Datanode BlockDeletingService
Weiwei Yang created HDFS-12354: -- Summary: Improve the throttle algorithm in Datanode BlockDeletingService Key: HDFS-12354 URL: https://issues.apache.org/jira/browse/HDFS-12354 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ozone Reporter: Weiwei Yang Assignee: Weiwei Yang {{BlockDeletingService}} is a per-datanode container block deleting service takes in charge of the "real" deletion of ozone blocks. It spawns a worker thread per container and delete blocks/chunks from disk as background threads. The number of threads currently is throttled by {{ozone.block.deleting.container.limit.per.interval}}, but there is a potential problem. Containers are sorted so it always fetch same of containers, we need to fix this by creating an API in {{ContainerManagerImpl}} to get a shuffled list of containers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12039) Ozone: Implement update volume owner in ozone shell
[ https://issues.apache.org/jira/browse/HDFS-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-12039. Resolution: Fixed Fix Version/s: HDFS-7240 > Ozone: Implement update volume owner in ozone shell > --- > > Key: HDFS-12039 > URL: https://issues.apache.org/jira/browse/HDFS-12039 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Lokesh Jain > Fix For: HDFS-7240 > > > Ozone shell command {{updateVolume}} should support to update the owner of a > volume, using following syntax > {code} > hdfs oz -updateVolume http://ozone1.fyre.ibm.com:9864/volume-wwei-0 -owner > xyz -root > {code} > this could work from rest api, following command could change the volume > owner to {{www}} > {code} > curl -X PUT -H "Date: Mon, 26 Jun 2017 04:23:30 GMT" -H "x-ozone-version: v1" > -H "x-ozone-user:www" -H "Authorization:OZONE root" > http://ozone1.fyre.ibm.com:9864/volume-wwei-0 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12307) Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails
[ https://issues.apache.org/jira/browse/HDFS-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-12307. Resolution: Duplicate Assignee: Weiwei Yang > Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails > --- > > Key: HDFS-12307 > URL: https://issues.apache.org/jira/browse/HDFS-12307 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > > It seems this UT constantly fails with following error > {noformat} > org.apache.hadoop.ozone.web.exceptions.OzoneException: Exception getting > XceiverClient. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:119) > at > com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createUsingDefault(StdValueInstantiator.java:243) > at > com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:146) > at > com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:133) > at > com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1579) > at > com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1200) > at > org.apache.hadoop.ozone.web.exceptions.OzoneException.parse(OzoneException.java:248) > at > org.apache.hadoop.ozone.web.client.OzoneBucket.executeGetKey(OzoneBucket.java:395) > at > org.apache.hadoop.ozone.web.client.OzoneBucket.getKey(OzoneBucket.java:321) > at > org.apache.hadoop.ozone.web.client.TestKeys.runTestPutAndGetKeyWithDnRestart(TestKeys.java:288) > at > org.apache.hadoop.ozone.web.client.TestKeys.testPutAndGetKeyWithDnRestart(TestKeys.java:265) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12307) Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails
Weiwei Yang created HDFS-12307: -- Summary: Ozone: TestKeys#testPutAndGetKeyWithDnRestart fails Key: HDFS-12307 URL: https://issues.apache.org/jira/browse/HDFS-12307 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang It seems this UT constantly fails with following error {noformat} org.apache.hadoop.ozone.web.exceptions.OzoneException: Exception getting XceiverClient. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:119) at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createUsingDefault(StdValueInstantiator.java:243) at com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:146) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:133) at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1579) at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1200) at org.apache.hadoop.ozone.web.exceptions.OzoneException.parse(OzoneException.java:248) at org.apache.hadoop.ozone.web.client.OzoneBucket.executeGetKey(OzoneBucket.java:395) at org.apache.hadoop.ozone.web.client.OzoneBucket.getKey(OzoneBucket.java:321) at org.apache.hadoop.ozone.web.client.TestKeys.runTestPutAndGetKeyWithDnRestart(TestKeys.java:288) at org.apache.hadoop.ozone.web.client.TestKeys.testPutAndGetKeyWithDnRestart(TestKeys.java:265) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12283) Ozone: DeleteKey-5: Implement SCM DeletedBlockLog
Weiwei Yang created HDFS-12283: -- Summary: Ozone: DeleteKey-5: Implement SCM DeletedBlockLog Key: HDFS-12283 URL: https://issues.apache.org/jira/browse/HDFS-12283 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, scm Reporter: Weiwei Yang Assignee: Weiwei Yang The DeletedBlockLog is a persisted log in SCM to keep tracking container blocks which are under deletion. It maintains info about under-deletion container blocks that notified by KSM, and the state how it is processed. We can use RocksDB to implement the 1st version of the log, the schema looks like ||TxID||ContainerName||Block List||ProcessedCount|| |0|c1|b1,b2,b3|0| |1|c2|b1|3| |2|c2|b2, b3|-1| Some explanations # TxID is an incremental long value transaction ID for ONE container and multiple blocks # Container name is the name of the container # Block list is a list of block IDs # ProcessedCount is the number of times SCM has sent this record to datanode, it represents the "state" of the transaction, it is in range of \[-1, 5\], -1 means the transaction eventually failed after some retries, 5 is the max number times of retries. We need to define {{DeletedBlockLog}} as an interface and implement this with RocksDB {{MetadataStore}} as the first version. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12282) Ozone: DeleteKey-4: SCM periodically sends block deletion message to datanode via HB and handles response
Weiwei Yang created HDFS-12282: -- Summary: Ozone: DeleteKey-4: SCM periodically sends block deletion message to datanode via HB and handles response Key: HDFS-12282 URL: https://issues.apache.org/jira/browse/HDFS-12282 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ozone, scm Reporter: Weiwei Yang Assignee: Weiwei Yang This is the task 3 in the design doc, implements the SCM to datanode interactions. Including # SCM sends block deletion message via HB to datanode # datanode changes block state to deleting when processes the HB response # datanode sends deletion ACKs back to SCM # SCM handles ACKs and removes blocks in DB -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12246) Ozone: potential thread leaks
Weiwei Yang created HDFS-12246: -- Summary: Ozone: potential thread leaks Key: HDFS-12246 URL: https://issues.apache.org/jira/browse/HDFS-12246 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Minor Per discussion in HDFS-12163, there might be some places potentially leaks threads, we will use this jira to track the work to fix those leaks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12235) Ozone: DeleteKey-3: KSM SCM block deletion message and ACK interactions
Weiwei Yang created HDFS-12235: -- Summary: Ozone: DeleteKey-3: KSM SCM block deletion message and ACK interactions Key: HDFS-12235 URL: https://issues.apache.org/jira/browse/HDFS-12235 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang KSM and SCM interaction for delete key operation, both KSM and SCM stores key state info in a backlog, KSM needs to scan this log and send block-deletion command to SCM, once SCM is fully aware of the message, KSM removes the key completely from namespace. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12196) Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background
Weiwei Yang created HDFS-12196: -- Summary: Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background Key: HDFS-12196 URL: https://issues.apache.org/jira/browse/HDFS-12196 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Implement a recycling service running on datanode to delete stale blocks periodically. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12195) Ozone: DeleteKey-1: KSM replies delete key request asynchronously
Weiwei Yang created HDFS-12195: -- Summary: Ozone: DeleteKey-1: KSM replies delete key request asynchronously Key: HDFS-12195 URL: https://issues.apache.org/jira/browse/HDFS-12195 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Yuanbo Liu We will implement delete key in ozone in multiple child tasks, this is 1 of the child task to implement client to scm communication. We need to do it in async manner, once key state is changed in ksm metadata, ksm is ready to reply client with a successful message. Actual deletes on other layers will happen some time later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey
Weiwei Yang created HDFS-12167: -- Summary: Ozone: Intermittent failure TestContainerPersistence#testListKey Key: HDFS-12167 URL: https://issues.apache.org/jira/browse/HDFS-12167 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, test Reporter: Weiwei Yang Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store
Weiwei Yang created HDFS-12149: -- Summary: Ozone: RocksDB implementation of ozone metadata store Key: HDFS-12149 URL: https://issues.apache.org/jira/browse/HDFS-12149 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang HDFS-12069 added a general interface for ozone metadata store, we already have a leveldb implementation, this JIRA is to track the work of rocksdb implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties
Weiwei Yang created HDFS-12148: -- Summary: Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties Key: HDFS-12148 URL: https://issues.apache.org/jira/browse/HDFS-12148 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Minor Following properties added by HDFS-11493 is missing in ozone-default.xml {noformat} ozone.scm.max.container.report.threads ozone.scm.container.report.processing.interval.seconds ozone.scm.container.reports.wait.timeout.seconds {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12129) Ozone
Weiwei Yang created HDFS-12129: -- Summary: Ozone Key: HDFS-12129 URL: https://issues.apache.org/jira/browse/HDFS-12129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Weiwei Yang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
Weiwei Yang created HDFS-12098: -- Summary: Ozone: Datanode is unable to register with scm if scm starts later Key: HDFS-12098 URL: https://issues.apache.org/jira/browse/HDFS-12098 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ozone, scm Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Reproducing steps # Start datanode # Wait and see datanode state, it has connection issues, this is expected # Start SCM, expecting datanode could connect to the scm and the state machine could transit to RUNNING. However in actual, its state transits to SHUTDOWN, datanode enters chill mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12096) Ozone: Bucket versioning design document
[ https://issues.apache.org/jira/browse/HDFS-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-12096. Resolution: Duplicate > Ozone: Bucket versioning design document > > > Key: HDFS-12096 > URL: https://issues.apache.org/jira/browse/HDFS-12096 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: Ozone Bucket Versioning v1.pdf > > > This JIRA is opened for the discussion of the bucket versioning design. > The bucket versioning is the ability to hold multiple versions objects of a > key in a bucket. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12096) Ozone: Bucket versioning design document
Weiwei Yang created HDFS-12096: -- Summary: Ozone: Bucket versioning design document Key: HDFS-12096 URL: https://issues.apache.org/jira/browse/HDFS-12096 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang This JIRA is opened for the discussion of the bucket versioning design. The bucket versioning is the ability to hold multiple versions objects of a key in a bucket. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12085) Reconfigure namenode interval fails if the interval was set with time unit
Weiwei Yang created HDFS-12085: -- Summary: Reconfigure namenode interval fails if the interval was set with time unit Key: HDFS-12085 URL: https://issues.apache.org/jira/browse/HDFS-12085 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, tools Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical It fails when I set duration with time unit, e.g 5s, error {noformat} Reconfiguring status for node [localhost:8111]: started at Tue Jul 04 08:14:18 PDT 2017 and finished at Tue Jul 04 08:14:18 PDT 2017. FAILED: Change property dfs.heartbeat.interval From: "3s" To: "5s" Error: For input string: "5s". {noformat} time unit support was added via HDFS-9847. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12082) BlockInvalidateLimit value is incorrectly set after namenode heartbeat interval reconfigured
Weiwei Yang created HDFS-12082: -- Summary: BlockInvalidateLimit value is incorrectly set after namenode heartbeat interval reconfigured Key: HDFS-12082 URL: https://issues.apache.org/jira/browse/HDFS-12082 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Reporter: Weiwei Yang Assignee: Weiwei Yang HDFS-1477 provides an option to reconfigured namenode heartbeat interval without restarting the namenode. When the heartbeat interval is reconfigured, {{blockInvalidateLimit}} gets recounted {code} this.blockInvalidateLimit = Math.max(20 * (int) (intervalSeconds), DFSConfigKeys.DFS_BLOCK_INVALIDATE_LIMIT_DEFAULT); {code} this doesn't honor the existing value set by {{dfs.block.invalidate.limit}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12081) Ozone: Add infoKey REST API document
Weiwei Yang created HDFS-12081: -- Summary: Ozone: Add infoKey REST API document Key: HDFS-12081 URL: https://issues.apache.org/jira/browse/HDFS-12081 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang HDFS-12030 has implemented {{infoKey}}, need to add appropriate document to {{OzoneRest.md}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12080) Ozone: Fix UT failure in TestOzoneConfigurationFields
Weiwei Yang created HDFS-12080: -- Summary: Ozone: Fix UT failure in TestOzoneConfigurationFields Key: HDFS-12080 URL: https://issues.apache.org/jira/browse/HDFS-12080 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Priority: Minor HDFS-12023 added a test case {{TestOzoneConfigurationFields}} to make sure ozone configuration properties is fully documented in ozone-default.xml. This is currently failing because 1. ozone-default.xml has 1 property not used anywhere {code} ozone.scm.internal.bind.host {code} 2. Some cblock properties are missing in ozone-default.xml {code} dfs.cblock.scm.ipaddress dfs.cblock.scm.port dfs.cblock.jscsi-address dfs.cblock.service.rpc-bind-host dfs.cblock.jscsi.rpc-bind-host {code} this needs to be fixed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12079) Description of dfs.block.invalidate.limit is incorrect in hdfs-default.xml
Weiwei Yang created HDFS-12079: -- Summary: Description of dfs.block.invalidate.limit is incorrect in hdfs-default.xml Key: HDFS-12079 URL: https://issues.apache.org/jira/browse/HDFS-12079 Project: Hadoop HDFS Issue Type: Bug Reporter: Weiwei Yang Assignee: Weiwei Yang The description of property {{dfs.block.invalidate.limit}} in hdfs-default.xml is {{noformat}} Limit on the list of invalidated block list kept by the Namenode. {{noformat}} this seems not correct that would confuse user. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12069) Ozone: Create a general abstraction for metadata store
Weiwei Yang created HDFS-12069: -- Summary: Ozone: Create a general abstraction for metadata store Key: HDFS-12069 URL: https://issues.apache.org/jira/browse/HDFS-12069 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Create a general abstraction for metadata store so that we can plug other key value store to host ozone metadata. Currently only levelDB is implemented, we want to support RocksDB as it provides more production ready features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12053) Ozone: ozone server should create missing metadata directory if it has permission to
Weiwei Yang created HDFS-12053: -- Summary: Ozone: ozone server should create missing metadata directory if it has permission to Key: HDFS-12053 URL: https://issues.apache.org/jira/browse/HDFS-12053 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Minor Datanode state machine right now simple fails if container metadata directory is missing, it is better to create the directory if it has permission to. This is extremely useful at a fresh setup, usually we set {{ozone.container.metadata.dirs}} to be under same parent of {{dfs.datanode.data.dir}}. E.g * /hadoop/hdfs/data * /hadoop/hdfs/scm if I don't pre-setup /hadoop/hdfs/scm/repository, ozone could not be started. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12047) Ozone: Add REST API documentation
Weiwei Yang created HDFS-12047: -- Summary: Ozone: Add REST API documentation Key: HDFS-12047 URL: https://issues.apache.org/jira/browse/HDFS-12047 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Add ozone rest api documentation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12039) Ozone: Implement update volume owner in ozone shell
Weiwei Yang created HDFS-12039: -- Summary: Ozone: Implement update volume owner in ozone shell Key: HDFS-12039 URL: https://issues.apache.org/jira/browse/HDFS-12039 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Ozone shell command {{updateVolume}} should support to update the owner of a volume, using following syntax {code} hdfs oz -updateVolume http://ozone1.fyre.ibm.com:9864/volume-wwei-0 -owner xyz -root {code} this could work from rest api, following command could change the volume owner to {{www}} {code} curl -X PUT -H "Date: Mon, 26 Jun 2017 04:23:30 GMT" -H "x-ozone-version: v1" -H "x-ozone-user:www" -H "Authorization:OZONE root" http://ozone1.fyre.ibm.com:9864/volume-wwei-0 {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12037) Ozone: Improvement rest API output format for better looking
Weiwei Yang created HDFS-12037: -- Summary: Ozone: Improvement rest API output format for better looking Key: HDFS-12037 URL: https://issues.apache.org/jira/browse/HDFS-12037 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Right now ozone rest api output is displayed as a raw json string in single line, not quite human readable, {noformat} {"volumes":[{"owner":{"name":"wwei"},"quota":{"unit":"GB","size":200},"volumeName":"volume-aug-1","createdOn":null,"createdBy":null}]} {noformat} propose to improve the output format by pretty printer {noformat} { "volumes" : [ { "owner" : { "name" : "wwei" }, "quota" : { "unit" : "GB", "size" : 200 }, "volumeName" : "volume-aug-1", "createdOn" : null, "createdBy" : null } ] } {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12035) Ozone: listKey doesn't work from ozone commandline
Weiwei Yang created HDFS-12035: -- Summary: Ozone: listKey doesn't work from ozone commandline Key: HDFS-12035 URL: https://issues.apache.org/jira/browse/HDFS-12035 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang HDFS-11782 implements listKey operation in KSM server side, but the commandline doesn't work right now, {code} ./bin/hdfs oz -listKey http://ozone1.fyre.ibm.com:9864/volume-wwei-0/bucket1/ {code} gives me following output {noformat} Command Failed : {"httpCode":400,"shortMessage":"invalidBucketName","resource":"wwei","message":"Illegal max number of keys specified, the value must be in range (0, 1024], actual : 0.","requestID":"d1a33851-6bfa-48d2-9afc-9dd7b06dfb0e","hostName":"ozone1.fyre.ibm.com"} {noformat} I think we have following things missing # ListKeyHandler doesn't support common listing arguments, start, length and prefix. # Http request to {{Bucket#listBucket}} uses 0 as the default value, I think that's why we got "Illegal max number of keys specified" error from command line. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11918) Ozone: Encapsulate KSM metadata key for better (de)serialization
[ https://issues.apache.org/jira/browse/HDFS-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-11918. Resolution: Later > Ozone: Encapsulate KSM metadata key for better (de)serialization > > > Key: HDFS-11918 > URL: https://issues.apache.org/jira/browse/HDFS-11918 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: HDFS-11918-HDFS-7240.001.patch > > > There are multiple type of keys stored in KSM database > # Volume Key > # Bucket Key > # Object Key > # User Key > Currently they are represented as plain string with some conventions, such as > # /volume > # /volume/bucket > # /volume/bucket/key > # $user > this approach makes it so difficult to parse volume/bucket/keys from KSM > database. Propose to encapsulate these types of keys into protobuf messages, > and take advantage of protobuf to serialize(deserialize) classes to byte > arrays (and vice versa). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11984) Ozone: Ensures listKey lists all required key fields
Weiwei Yang created HDFS-11984: -- Summary: Ozone: Ensures listKey lists all required key fields Key: HDFS-11984 URL: https://issues.apache.org/jira/browse/HDFS-11984 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang HDFS-11782 implements the listKey operation which only lists the basic key fields, we need to make sure it return all required fields # version # md5hash # createdOn # size # keyName # dataFileName this task is depending on the work of HDFS-11886. See more discussion [here | https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11959) Ozone: Audit Logs
Weiwei Yang created HDFS-11959: -- Summary: Ozone: Audit Logs Key: HDFS-11959 URL: https://issues.apache.org/jira/browse/HDFS-11959 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Add audit logs for ozone components. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11958) Ozone: Ensure KSM is initiated using ProtobufRpcEngine
Weiwei Yang created HDFS-11958: -- Summary: Ozone: Ensure KSM is initiated using ProtobufRpcEngine Key: HDFS-11958 URL: https://issues.apache.org/jira/browse/HDFS-11958 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Reproduce Steps # Launch an ozone cluster # Create a volume via commandline {code} hdfs oz -createVolume http://ozone1.fyre.ibm.com:9864/volume-wwei-0 -user root {code} it failed with following error {noformat} SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:182) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:114) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:247) at com.sun.proxy.$Proxy18.createVolume(Unknown Source) ... Caused by: java.lang.NoSuchFieldException: versionID at java.lang.Class.getField(Class.java:1703) at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:178) ... 25 more {noformat} This was because {{keySpaceManagerClient}} in {{ObjectStoreHandler}} currently is not properly initiated, it should be using {{ProtobufRpcEngine}} instead of {{WritableRpcEngine}} which is deprecated. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11955) Ozone: Set proper parameter default values for listBuckets http request
Weiwei Yang created HDFS-11955: -- Summary: Ozone: Set proper parameter default values for listBuckets http request Key: HDFS-11955 URL: https://issues.apache.org/jira/browse/HDFS-11955 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang HDFS-11779 implements the listBuckets function in ozone server side, the API supports several parameters, startKey, count and prefix. But both of them are optional for the client side rest API. This jira is to make sure we set proper default values in the http request if they are not explicitly set by users. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11952) Ozone: Fix regression TestContainerSQLCli#testConvertContainerDB
Weiwei Yang created HDFS-11952: -- Summary: Ozone: Fix regression TestContainerSQLCli#testConvertContainerDB Key: HDFS-11952 URL: https://issues.apache.org/jira/browse/HDFS-11952 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang TestContainerSQLCli#testConvertContainerDB is failing since HDFS-11568. Error message: {noformat} 2017-06-08 08:21:47,653 [main] ERROR - DB path not exist:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/MiniOzoneCluster1113d40f-586f-4914-9ac4-a37c1a3a561d/05bdadbc-1e60-46e0-bf57-efc4f21f2e7e/scm/container.db ... java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.ozone.scm.TestContainerSQLCli.testConvertContainerDB(TestContainerSQLCli.java:255) {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11951) Ozone
Weiwei Yang created HDFS-11951: -- Summary: Ozone Key: HDFS-11951 URL: https://issues.apache.org/jira/browse/HDFS-11951 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Weiwei Yang -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11926) Ozone: Implement a common helper to return a range of KVs in levelDB
Weiwei Yang created HDFS-11926: -- Summary: Ozone: Implement a common helper to return a range of KVs in levelDB Key: HDFS-11926 URL: https://issues.apache.org/jira/browse/HDFS-11926 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang There are quite some *LIST* operations need to get a range of keys or values from levelDB, and filter entries with key prefix. # HDFS-11782 listKeys # HDFS-11779 listBuckets # HDFS-11773 listVolumes # HDFS-11679 listContainers we need to implement a common utility for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11917) Why when using the hdfs nfs gateway, a file which is smaller than one block size required a block
[ https://issues.apache.org/jira/browse/HDFS-11917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-11917. Resolution: Not A Problem Assignee: Weiwei Yang > Why when using the hdfs nfs gateway, a file which is smaller than one block > size required a block > - > > Key: HDFS-11917 > URL: https://issues.apache.org/jira/browse/HDFS-11917 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.8.0 >Reporter: BINGHUI WANG >Assignee: Weiwei Yang > > I use the linux shell to put the file into the hdfs throuth the hdfs nfs > gateway. I found that if the file which size is smaller than one block(128M), > it will still takes one block(128M) of hdfs storage by this way. But after a > few minitues the excess storage will be released. > e.g:If I put the file(60M) into the hdfs throuth the hdfs nfs gateway, it > will takes one block(128M) at first. After a few minitues the excess > storage(68M) will > be released. The file only use 60M hdfs storage at last. > Why is will be this? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11918) Ozone: Encapsulate KSM metadata key into protobuf messages for better (de)serialization
Weiwei Yang created HDFS-11918: -- Summary: Ozone: Encapsulate KSM metadata key into protobuf messages for better (de)serialization Key: HDFS-11918 URL: https://issues.apache.org/jira/browse/HDFS-11918 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical There are multiple type of keys stored in KSM database # Volume Key # Bucket Key # Object Key # User Key Currently they are represented as plain string with different convention, such as # /volume # /volume/bucket # /volume/bucket/key # $user this approach makes it so difficult to parse volume/bucket/keys from KSM database. Propose to encapsulate these types of keys into protobuf messages, and take advantage of protobuf to serialize(deserialize) classes to byte arrays (and vice versa). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11913) Ozone: TestKeySpaceManager#testDeleteVolume fails
Weiwei Yang created HDFS-11913: -- Summary: Ozone: TestKeySpaceManager#testDeleteVolume fails Key: HDFS-11913 URL: https://issues.apache.org/jira/browse/HDFS-11913 Project: Hadoop HDFS Issue Type: Bug Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang HDFS-11774 introduces an UT failure, {{TestKeySpaceManager#testDeleteVolume}}, error as below {noformat} java.util.NoSuchElementException at org.fusesource.leveldbjni.internal.JniDBIterator.peekNext(JniDBIterator.java:84) at org.fusesource.leveldbjni.internal.JniDBIterator.next(JniDBIterator.java:98) at org.fusesource.leveldbjni.internal.JniDBIterator.next(JniDBIterator.java:45) at org.apache.hadoop.ozone.ksm.MetadataManagerImpl.isVolumeEmpty(MetadataManagerImpl.java:221) at org.apache.hadoop.ozone.ksm.VolumeManagerImpl.deleteVolume(VolumeManagerImpl.java:294) at org.apache.hadoop.ozone.ksm.KeySpaceManager.deleteVolume(KeySpaceManager.java:340) at org.apache.hadoop.ozone.protocolPB.KeySpaceManagerProtocolServerSideTranslatorPB.deleteVolume(KeySpaceManagerProtocolServerSideTranslatorPB.java:200) at org.apache.hadoop.ozone.protocol.proto.KeySpaceManagerProtocolProtos$KeySpaceManagerService$2.callBlockingMethod(KeySpaceManagerProtocolProtos.java:22742) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) {noformat} this is caused by a buggy code in {{MetadataManagerImpl#isVolumeEmpty}}, there are 2 issues need to be fixed # Iterate next element will throw this exception if it doesn't have next. This always fail when a volume is empty. # The code was checking if the first bucket name start with "/volume_name", this will return a wrong value if I have several empty volumes with same prefix, e.g "/volA/", "/volAA/". Such case {{isVolumeEmpty}} will return false as the next element from "/volA/" is not a bucket, it's another volume "/volAA/" but matches the prefix. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11740) Ozone: Differentiate time interval for different DatanodeStateMachine state tasks
[ https://issues.apache.org/jira/browse/HDFS-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-11740. Resolution: Later Will revisit this if necessary in feature. > Ozone: Differentiate time interval for different DatanodeStateMachine state > tasks > - > > Key: HDFS-11740 > URL: https://issues.apache.org/jira/browse/HDFS-11740 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-11740-HDFS-7240.001.patch, > HDFS-11740-HDFS-7240.002.patch, HDFS-11740-HDFS-7240.003.patch, > statemachine_1.png, statemachine_2.png > > > Currently datanode state machine transitioned between tasks in a fixed time > interval, defined by {{ScmConfigKeys#OZONE_SCM_HEARTBEAT_INTERVAL_SECONDS}}, > the default value is 30s. Once datanode is started, it will need 90s before > transited to {{Heartbeat}} state, such a long lag is not necessary. Propose > to improve the logic of time interval handling, it seems only the heartbeat > task needs to be scheduled in {{OZONE_SCM_HEARTBEAT_INTERVAL_SECONDS}} > interval, rest should be done without any lagging. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11873) Ozone: Object store handler cannot serve requests from same http client
Weiwei Yang created HDFS-11873: -- Summary: Ozone: Object store handler cannot serve requests from same http client Key: HDFS-11873 URL: https://issues.apache.org/jira/browse/HDFS-11873 Project: Hadoop HDFS Issue Type: Sub-task Components: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical This issue was found when I worked on HDFS-11846. Instead of creating a new http client instance per request, I tried to reuse {{CloseableHttpClient}} in {{OzoneClient}} class in a {{PoolingHttpClientConnectionManager}}. However, every second request from the http client hangs, which could not get dispatched to {{ObjectStoreJerseyContainer}}. There seems to be something wrong in the netty pipeline, this jira aims to 1) fix the problem in the server side 2) use the pool for client http clients to reduce the resource overhead. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11871) balance include Parameter Usage Error
[ https://issues.apache.org/jira/browse/HDFS-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-11871. Resolution: Not A Problem > balance include Parameter Usage Error > - > > Key: HDFS-11871 > URL: https://issues.apache.org/jira/browse/HDFS-11871 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.7.3 >Reporter: kevy liu >Assignee: Weiwei Yang >Priority: Trivial > > [hadoop@bigdata-hdp-apache505 hadoop-2.7.2]$ bin/hdfs balancer -h > Usage: hdfs balancer > [-policy ] the balancing policy: datanode or blockpool > [-threshold ]Percentage of disk capacity > [-exclude [-f | ]] > Excludes the specified datanodes. > [-include [-f | ]] > Includes only the specified datanodes. > [-idleiterations ] Number of consecutive idle > iterations (-1 for Infinite) before exit. > Parameter Description: > -f | > The parse separator in the code is: > String[] nodes = line.split("[ \t\n\f\r]+"); -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11846) Ozone: Potential http connection leaks in ozone clients
Weiwei Yang created HDFS-11846: -- Summary: Ozone: Potential http connection leaks in ozone clients Key: HDFS-11846 URL: https://issues.apache.org/jira/browse/HDFS-11846 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang There are some http clients in {{OzoneVolume}}, {{OzoneBucket}} and {{OzoneClient}} are not properly closed which would leak resource leaks. This jira's purpose is to fix these issues and investigate if we can reuse some of http connections for better performance. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11845) Ozone: Output error when DN handshakes with SCM
Weiwei Yang created HDFS-11845: -- Summary: Ozone: Output error when DN handshakes with SCM Key: HDFS-11845 URL: https://issues.apache.org/jira/browse/HDFS-11845 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Minor When start SCM and DN, there is always an error in SCM log {noformat} 17/05/17 15:19:59 WARN ipc.Server: IPC Server handler 9 on 9861, call Call#4 Retry#0 org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol.getVersion from 172.16.165.133:44824: output error 17/05/17 15:19:59 INFO ipc.Server: IPC Server handler 9 on 9861 caught an exception java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:3216) at org.apache.hadoop.ipc.Server.access$1600(Server.java:135) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1463) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1533) at org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2581) at org.apache.hadoop.ipc.Server$Connection.access$300(Server.java:1605) at org.apache.hadoop.ipc.Server$RpcCall.doResponse(Server.java:931) at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:765) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11844) Ozone: Recover SCM state when SCM is restarted
Weiwei Yang created HDFS-11844: -- Summary: Ozone: Recover SCM state when SCM is restarted Key: HDFS-11844 URL: https://issues.apache.org/jira/browse/HDFS-11844 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, scm Reporter: Weiwei Yang Assignee: Weiwei Yang SCM losses its state once being restarted. A simple test can be done by following steps # Start NN, DN, SCM # Create several containers via SCM CLI # Restart SCM # Get existing container state via SCM CLI, this step will fail with container doesn't exist error. {{ContainerManagerImpl}} maintains a cache of container mapping {{containerMap}}, if SCM is restarted, this information is lost. We need a way to restore the state from DB in a background thread. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11830) Ozone: Datanode needs to re-register to SCM if SCM is restarted
Weiwei Yang created HDFS-11830: -- Summary: Ozone: Datanode needs to re-register to SCM if SCM is restarted Key: HDFS-11830 URL: https://issues.apache.org/jira/browse/HDFS-11830 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Problem description: # Start NN, DN, SCM # Restart SCM and will see following warnings in SCM log 17/05/02 00:47:08 WARN node.SCMNodeManager: SCM receive heartbeat from unregistered datanode Datanode could not re-establish communication with SCM afterwards. Propose to fix this by adding a new command in HB handling telling datanode to re-register with SCM. Datanode once received this command transits to REGISTER state again to proceed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11761) Ozone: Get container report should closed container reports
Weiwei Yang created HDFS-11761: -- Summary: Ozone: Get container report should closed container reports Key: HDFS-11761 URL: https://issues.apache.org/jira/browse/HDFS-11761 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Attachments: HDFS-11761-HDFS-7240.001.patch {{ContainerManagerImpl#getContainerReports}} should return only closed container reports. But there is seems to be a negligence to return open ones instead. We also need to add unit test for this operation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11725) Ozone: Revise create container CLI specification and implementation
Weiwei Yang created HDFS-11725: -- Summary: Ozone: Revise create container CLI specification and implementation Key: HDFS-11725 URL: https://issues.apache.org/jira/browse/HDFS-11725 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Per [design doc|https://issues.apache.org/jira/secure/attachment/12861478/storage-container-manager-cli-v002.pdf] in HDFS-11470 {noformat} hdfs scm -container create -p Notes : This command connects to SCM and creates a container. Once the container is created in the SCM, the corresponding container is created at the appropriate datanode. Optional -p allows the user to control which pipeline to use while creating this container, this is strictly for debugging and testing. {noformat} it has 2 problems with this design, 1st it does support a container name but it is quite useful for testing; 2nd it supports an optional option for pipeline, that is not quite necessary right now given SCM handles the creation of the pipelines, we might want to support this later. So proposed to revise the CLI to {code} hdfs scm -container create -c {code} the {{-c}} option is *required*. Backend it does following steps # Given the container name, ask SCM where the container should be replicated to. This returns a pipeline. # Communicate with each datanode in the pipeline to create the container. this jira is to track the work to update both the design doc as well as the implementation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11716) Revisit delete container API
Weiwei Yang created HDFS-11716: -- Summary: Revisit delete container API Key: HDFS-11716 URL: https://issues.apache.org/jira/browse/HDFS-11716 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Current delete container API seems can be possibly running into inconsistent state. SCM maintains a mapping of container to nodes, datanode maintains the actual container's data. When deletes a container, we need to make sure db is removed as well as the mapping in SCM also gets updated. What if the datanode failed to remove stuff for a container, do we update the mapping? We need to revisit the implementation and get these issues addressed. See more discussion [here|https://issues.apache.org/jira/browse/HDFS-11675?focusedCommentId=15987798=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15987798] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11678) Ozone: SCM CLI: Implement get container metrics command
Weiwei Yang created HDFS-11678: -- Summary: Ozone: SCM CLI: Implement get container metrics command Key: HDFS-11678 URL: https://issues.apache.org/jira/browse/HDFS-11678 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Weiwei Yang Implement the command to get container metrics {code} hdfs scm -container metrics {code} this command returns container metrics in certain format, e.g json. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11677) OZone: SCM CLI: Implement get container command
Weiwei Yang created HDFS-11677: -- Summary: OZone: SCM CLI: Implement get container command Key: HDFS-11677 URL: https://issues.apache.org/jira/browse/HDFS-11677 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Implement get container {code} hdfs scm -container get -o {code} This command works only against a closed container. If the container is closed, then SCM will return the address of the datanodes. The datanodes support an API called copyCon- tainer, which returns the container as a tar ball. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11676) Ozone: SCM CLI: Implement close container command
Weiwei Yang created HDFS-11676: -- Summary: Ozone: SCM CLI: Implement close container command Key: HDFS-11676 URL: https://issues.apache.org/jira/browse/HDFS-11676 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Implement delete container {code} hdfs scm -container close {code} This command connects to SCM and closes a container. Once the container is closed in the SCM, the corresponding container is closed at the appropriate datanode. if the container does not exist, it will return an error. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11675) Ozone: SCM CLI: Implement delete container command
Weiwei Yang created HDFS-11675: -- Summary: Ozone: SCM CLI: Implement delete container command Key: HDFS-11675 URL: https://issues.apache.org/jira/browse/HDFS-11675 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Implement delete container {code} hdfs scm -container del -f {code} Deletes a container if it is empty. The -f options can be used to force a delete of a non-empty container. If container name specified not exist, prints a clear error message. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11668) Ozone: misc improvements for SCM CLI
Weiwei Yang created HDFS-11668: -- Summary: Ozone: misc improvements for SCM CLI Key: HDFS-11668 URL: https://issues.apache.org/jira/browse/HDFS-11668 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Weiwei Yang Assignee: Weiwei Yang Once HDFS-11649 is done, there are some improvements need to be done in order to make SCM CLI better to use in a pseudo cluster, this includes # HDFS-11649 adds java classes for CLIs, we will need to add shell code to expose these commands # Better error messages when missing some key ozone configurations, e.g {{ozone.scm.names}}, {{ozone.scm.datanode.id}} ... etc # Property {{ozone.enabled}} is not honored, don't know why yet # Better logging. Currently {{DatanodeStateMachine}} prints very limited logs, adds some more logs to indicate state transition is necessary. The ultimate goal of this ticket is to ensure SCM CLI can work nicely on a pseudo cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11658) Ozone: SCM daemon is unable to be started via CLI
Weiwei Yang created HDFS-11658: -- Summary: Ozone: SCM daemon is unable to be started via CLI Key: HDFS-11658 URL: https://issues.apache.org/jira/browse/HDFS-11658 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang SCM daemon can no longer be started via CLI since {{StorageContainerManager}} class package renamed from {{org.apache.hadoop.ozone.storage.StorageContainerManager}} to {{org.apache.hadoop.ozone.scm.StorageContainerManager}} after HDFS-11184. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11655) Ozone: CLI: Guarantees user runs ozone commands has appropriate permission
Weiwei Yang created HDFS-11655: -- Summary: Ozone: CLI: Guarantees user runs ozone commands has appropriate permission Key: HDFS-11655 URL: https://issues.apache.org/jira/browse/HDFS-11655 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang We need to add a permission check module for ozone command line utilities, to make sure users run commands with proper privileges. For now, in commands in [design doc| https://issues.apache.org/jira/secure/attachment/12861478/storage-container-manager-cli-v002.pdf] all require admin privilege. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11625) Ozone: Replace hard coded datanode data dir in test code with getStorageDir to fix UT failures
Weiwei Yang created HDFS-11625: -- Summary: Ozone: Replace hard coded datanode data dir in test code with getStorageDir to fix UT failures Key: HDFS-11625 URL: https://issues.apache.org/jira/browse/HDFS-11625 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang There seems to be some UT regressions after HDFS-11519, such as * TestDataNodeVolumeFailureToleration * TestDataNodeVolumeFailureReporting * TestDiskBalancerCommand * TestBlockStatsMXBean * TestDataNodeVolumeMetrics * TestDFSAdmin * TestDataNodeHotSwapVolumes * TestDataNodeVolumeFailure these tests set up datanode data dir by some hard coded names, such as {code} new File(cluster.getDataDirectory(), "data1"); {code} this no longer works since HDFS-11519 changes the pattern from {code} /data/data<2*dnIndex + 1> /data/data<2*dnIndex + 2> ... {code} to {code} /data/dn0_data0 /data/dn0_data1 /data/dn1_data0 /data/dn1_data1 ... {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11585) Ozone: Support force update a container
Weiwei Yang created HDFS-11585: -- Summary: Ozone: Support force update a container Key: HDFS-11585 URL: https://issues.apache.org/jira/browse/HDFS-11585 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang HDFS-11567 added support of updating a container, and in following cases # Container is closed # Container meta file is falsely removed on disk or corrupted a container cannot be gracefully updated. It is useful to support forcibly update if a container gets into such state, that gives us the chance to repair meta data. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11581) Ozone: Support force delete a container
Weiwei Yang created HDFS-11581: -- Summary: Ozone: Support force delete a container Key: HDFS-11581 URL: https://issues.apache.org/jira/browse/HDFS-11581 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang In some occasions, we may want to forcibly delete a container regardless of if deletion condition is satisfied, e.g container is empty. This way we can do best-effort to clean up containers. Note, only a CLOSED container can be force deleted. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11569) Ozone: Implement listKey function for KeyManager
Weiwei Yang created HDFS-11569: -- Summary: Ozone: Implement listKey function for KeyManager Key: HDFS-11569 URL: https://issues.apache.org/jira/browse/HDFS-11569 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang List keys by prefix from a container. This doesn't need to support pagination as keys in a single container should be containable. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11567) Support update container
Weiwei Yang created HDFS-11567: -- Summary: Support update container Key: HDFS-11567 URL: https://issues.apache.org/jira/browse/HDFS-11567 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Weiwei Yang Add support to update a container. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-11413) HDFS fsck command shows health as corrupt for '/'
[ https://issues.apache.org/jira/browse/HDFS-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved HDFS-11413. Resolution: Not A Bug > HDFS fsck command shows health as corrupt for '/' > - > > Key: HDFS-11413 > URL: https://issues.apache.org/jira/browse/HDFS-11413 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Nishant Verma > > I have open source hadoop version 2.7.3 cluster (2 Masters + 3 Slaves) > installed on AWS EC2 instances. I am using the cluster to integrate it with > Kafka Connect. > The setup of cluster was done last month and setup of kafka connect was > completed last fortnight. Since then, we were able to operate the kafka topic > records on our HDFS and do various operations. > Since last afternoon, I find that any kafka topic is not getting committed to > the cluster. When I tried to open the older files, I started getting below > error. When I copy a new file to the cluster from local, it comes and gets > opened but after some time, again starts showing similar IOException: > == > 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log > 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: > java.io.IOException: No live nodes contain block > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking > nodes = [], ignoredNodes = null No live nodes contain current block Block > locations: Dead nodes: . Will get new block locations from namenode and > retry... > 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 > IOException, will wait for 499.3472970548959 msec. > 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log > 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: > java.io.IOException: No live nodes contain block > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking > nodes = [], ignoredNodes = null No live nodes contain current block Block > locations: Dead nodes: . Will get new block locations from namenode and > retry... > 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 > IOException, will wait for 4988.873277172643 msec. > 17/02/14 07:58:00 INFO hdfs.DFSClient: No node available for > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log > 17/02/14 07:58:00 INFO hdfs.DFSClient: Could not obtain > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: > java.io.IOException: No live nodes contain block > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking > nodes = [], ignoredNodes = null No live nodes contain current block Block > locations: Dead nodes: . Will get new block locations from namenode and > retry... > 17/02/14 07:58:00 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 > IOException, will wait for 8598.311122824263 msec. > 17/02/14 07:58:09 WARN hdfs.DFSClient: Could not obtain block: > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log No live nodes contain current block Block > locations: Dead nodes: . Throwing a BlockMissingException > 17/02/14 07:58:09 WARN hdfs.DFSClient: Could not obtain block: > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log No live nodes contain current block Block > locations: Dead nodes: . Throwing a BlockMissingException > 17/02/14 07:58:09 WARN hdfs.DFSClient: DFS Read > org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:983) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) > at > org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107) > at >