[jira] [Created] (HDFS-13128) HDFS balancer in single node cluster fails with "Another Balancer is running.."
Zbigniew Kostrzewa created HDFS-13128: - Summary: HDFS balancer in single node cluster fails with "Another Balancer is running.." Key: HDFS-13128 URL: https://issues.apache.org/jira/browse/HDFS-13128 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover, hdfs Affects Versions: 2.7.3 Reporter: Zbigniew Kostrzewa In a single node "cluster", HDFS balancer fails with: {noformat} Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved java.io.IOException: Another Balancer is running.. Exiting ... {noformat} and in Name Node logs there is: {noformat} 2018-02-09 07:23:21,671 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073741865_1041{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[ DISK]DS-dae233d3-5c71-498e-9a8b-669bff3fccdf:NORMAL:10.9.4.184:30010|RBW]]} for /system/balancer.id 2018-02-09 07:23:21,739 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /system/balancer.id for DFSClient_NONMAPREDUCE_-1126407107_1 2018-02-09 07:23:21,758 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.append: Failed to APPEND_FILE /system/balancer.id for DFSClient_NONMAPREDUCE_1275100437_1 on 10.9.4.184 becaus e this file lease is currently owned by DFSClient_NONMAPREDUCE_-1126407107_1 on 10.9.4.184 2018-02-09 07:23:21,758 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.append from 10.9.4.184:49781 Call#12 Retry#0: org. apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to APPEND_FILE /system/balancer.id for DFSClient_NONMAPREDUCE_1275100437_1 on 10.9.4.184 because this file lease is currently owned by DFSClient_NONMAPREDUCE_-1126407107_1 on 10.9.4.184 2018-02-09 07:23:21,773 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.9.4.184:30010 is added to blk_1073741865_1041{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primar yNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-dae233d3-5c71-498e-9a8b-669bff3fccdf:NORMAL:10.9.4.184:30010|RBW]]} size 15 2018-02-09 07:23:21,776 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /system/balancer.id is closed by DFSClient_NONMAPREDUCE_-1126407107_1{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13127) Fix TestContainerStateManager and TestOzoneConfigurationFields
Mukul Kumar Singh created HDFS-13127: Summary: Fix TestContainerStateManager and TestOzoneConfigurationFields Key: HDFS-13127 URL: https://issues.apache.org/jira/browse/HDFS-13127 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Mukul Kumar Singh Assignee: Mukul Kumar Singh Fix For: HDFS-7240 TestContainerStateManager is failing because SCM is unable to find a container with enough free space to allocate a new block in the container. TestOzoneConfigurationFields is failing because configs "ozone.rest.servers" and "ozone.rest.client.port" are added in ozone-default.xml however they aren't specified as any of the config keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
回复:[DISCUSS] Meetup for HDFS tests and build infra
Thanks Chris for driving on this. >>I'm looking at you, TestDFSStripedOutputStreamWithFailure ...AFAIK and IMO, >>it's pretty hard to get all the test cases stably running given the >>limitation of MiniDFSCluster, and if we'd agree on that, we could remove >>these cases as unit tests and cover them in integration tests instead using a >>true cluster, like based on k8s infra. We're lacking basic facility infra env >>and tools to get most of the complicated functionalities well tested and >>covered, so let's avoid too much complicated tests. Fixing of such tests >>should definitely help and be appreciated. Regards,Kai --发件人:Chris Douglas发送时间:2018年2月8日(星期四) 08:39收件人:Hdfs-dev 主 题:Re: [DISCUSS] Meetup for HDFS tests and build infra Created a poll [1] to inform scheduling. -C [1]: https://doodle.com/poll/r22znitzae9apfbf On Tue, Feb 6, 2018 at 3:09 PM, Chris Douglas wrote: > The HDFS build is not healthy. Many of the unit tests aren't actually > run in Jenkins due to resource exhaustion, haven't been updated since > build/test/data was the test temp dir, or are chronically unstable > (I'm looking at you, TestDFSStripedOutputStreamWithFailure). The > situation has deteriorated slowly, but we can't confidently merge > patches, let alone significant features, when our CI infra is in this > state. > > How would folks feel about a half to full-day meetup to work through > patches improving this, specifically? We can improve tests, > troubleshoot the build, and rev/commit existing patches. It would > require some preparation, so the simultaneous attention is productive > and not a coordination bottleneck. I started a wiki page for this [1], > please add to it. > > If enough people can make time for this, say in 2-3 weeks, the project > would certainly benefit. -C > > [1]: https://s.apache.org/ng3C - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13122) Tailing edits should not update quota counts on ObserverNode
[ https://issues.apache.org/jira/browse/HDFS-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HDFS-13122. Resolution: Duplicate > Tailing edits should not update quota counts on ObserverNode > > > Key: HDFS-13122 > URL: https://issues.apache.org/jira/browse/HDFS-13122 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > Currently in {{FSImage#loadEdits()}}, after applying a set of edits, we call > {code} > updateCountForQuota(target.getBlockManager().getStoragePolicySuite(), > target.dir.rootDir); > {code} > to update the quota counts for the entire namespace, which can be very > expensive. This makes sense if we are about to become the ANN, since we need > valid quotas, but not on an ObserverNode which does not need to enforce > quotas. > This is related to increasing the frequency with which the SbNN can tail > edits from the ANN to decrease the lag time for transactions to appear on the > Observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-13120) Snapshot diff could be corrupted after concat
[ https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reopened HDFS-13120: --- This breaks 2.8 compilation. Reverting... > Snapshot diff could be corrupted after concat > - > > Key: HDFS-13120 > URL: https://issues.apache.org/jira/browse/HDFS-13120 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, snapshots >Affects Versions: 2.7.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6 > > Attachments: HDFS-13120.001.patch, HDFS-13120.002.patch > > > The snapshot diff can be corrupted after concat files. This could lead to > Assertion upon DeleteSnapshot and getSnapshotDiff operations later. > For example, we have seen customers hit stack trace similar to the one below > but during loading edit entry of DeleteSnapshotOp. After the investigation, > we found this is a regression caused by HDFS-3689 where the snapshot diff is > not fully cleaned up after concat. > I will post the unit test to repro this and fix for it shortly. > {code} > org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element > already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196) > at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321) > at > org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/130/ [Feb 7, 2018 4:58:28 PM] (billie) Revert "YARN-6078. Containers stuck in Localizing state. Contributed by [Feb 7, 2018 11:35:41 PM] (yzhang) HDFS-13115. In getNumUnderConstructionBlocks(), ignore the inodeIds for -1 overall The following subsystems voted -1: asflicense unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Unreaped Processes : hadoop-common:1 hadoop-hdfs:24 bkjournal:5 hadoop-yarn-common:1 hadoop-yarn-server-nodemanager:1 hadoop-yarn-server-timelineservice:1 hadoop-yarn-client:4 hadoop-yarn-applications-distributedshell:1 hadoop-mapreduce-client-app:1 hadoop-mapreduce-client-jobclient:14 hadoop-distcp:2 hadoop-extras:1 Failed junit tests : hadoop.hdfs.TestBlocksScheduledCounter hadoop.hdfs.TestDFSClientFailover hadoop.hdfs.web.TestHttpsFileSystem hadoop.hdfs.TestSetTimes hadoop.hdfs.TestDatanodeRegistration hadoop.hdfs.web.TestWebHdfsFileSystemContract hadoop.hdfs.web.TestWebHDFSAcl hadoop.hdfs.TestDatanodeReport hadoop.hdfs.TestMiniDFSCluster hadoop.hdfs.web.TestHftpFileSystem hadoop.hdfs.TestDFSClientRetries hadoop.yarn.server.nodemanager.webapp.TestNMWebServer hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels hadoop.yarn.server.nodemanager.TestNodeStatusUpdater hadoop.mapred.TestJavaSerialization hadoop.mapreduce.TestMRJobClient hadoop.mapred.TestClientRedirect hadoop.mapred.TestMapProgress hadoop.mapred.TestReduceFetch hadoop.mapreduce.security.ssl.TestEncryptedShuffle hadoop.mapred.TestLocalJobSubmission hadoop.mapreduce.security.TestBinaryTokenFile hadoop.mapreduce.security.TestJHSSecurity hadoop.fs.TestFileSystem hadoop.mapreduce.TestChild hadoop.mapreduce.security.TestMRCredentials hadoop.conf.TestNoDefaultsJobConf hadoop.fs.TestDFSIO hadoop.mapred.TestJobSysDirWithDFS hadoop.tools.TestIntegration hadoop.tools.TestDistCpViewFs hadoop.resourceestimator.solver.impl.TestLpSolver hadoop.resourceestimator.service.TestResourceEstimatorService Timed out junit tests : org.apache.hadoop.log.TestLogLevel org.apache.hadoop.hdfs.TestLeaseRecovery2 org.apache.hadoop.security.TestPermission org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade org.apache.hadoop.hdfs.TestFileAppendRestart org.apache.hadoop.hdfs.TestReadWhileWriting org.apache.hadoop.hdfs.security.TestDelegationToken org.apache.hadoop.security.TestPermissionSymlinks org.apache.hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter org.apache.hadoop.hdfs.TestDFSMkdirs org.apache.hadoop.hdfs.TestDFSOutputStream org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs org.apache.hadoop.hdfs.web.TestWebHDFSXAttr org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.web.TestWebHDFSForHA org.apache.hadoop.hdfs.TestReplaceDatanodeFailureReplication org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM org.apache.hadoop.contrib.bkjournal.TestBookKeeperJournalManager org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir org.apache.hadoop.contrib.bkjournal.TestBookKeeperSpeculativeRead org.apache.hadoop.yarn.webapp.TestWebApp org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServices org.apache.hadoop.yarn.client.TestRMFailover org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.api.impl.TestYarnClientWithReservation org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup org.apache.hadoop.mapred.lib.TestDelegatingInputFormat
[jira] [Created] (HDFS-13126) Re-enable HTTP Request Logging for WebHDFS
Erik Krogen created HDFS-13126: -- Summary: Re-enable HTTP Request Logging for WebHDFS Key: HDFS-13126 URL: https://issues.apache.org/jira/browse/HDFS-13126 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Affects Versions: 2.7.0 Reporter: Erik Krogen Due to HDFS-7279, starting in 2.7.0, the DataNode HTTP Request logs no longer include WebHDFS requests because the HTTP Request handling is done internal to {{HttpServer2}}, which is no longer used. If the request logging is enabled, we should add a Netty [LoggingHandler|https://netty.io/4.0/api/io/netty/handler/logging/LoggingHandler.html] to the ChannelPipeline for the http(s) servers used by the DataNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Re: Apache Hadoop 3.0.1 Release plan
Hi, Brahma Thanks for reminder. YARN-5742 does not look like a blocker to me. I will create a RC right after HADOOP-14060. On Thu, Feb 8, 2018 at 7:35 AM, Kihwal Leewrote: > HADOOP-14060 is a blocker. Daryn will add more detail to the jira or to > this thread. > > On Thu, Feb 8, 2018 at 7:01 AM, Brahma Reddy Battula > wrote: >> >> Hi Eddy, >> >> HDFS-12990 got committed to 3.0.1,can we have RC for 3.0.1 (only >> YARN-5742 >> blocker is open ) ? >> >> >> On Sat, Feb 3, 2018 at 12:40 AM, Chris Douglas >> wrote: >> >> > On Fri, Feb 2, 2018 at 10:22 AM, Arpit Agarwal >> > >> > wrote: >> > > Do you plan to roll an RC with an uncommitted fix? That isn't the >> > > right >> > approach. >> > >> > The fix will be committed to the release branch. We'll vote on the >> > release, and if it receives a majority of +1 votes then it becomes >> > 3.0.1. That's how the PMC decides how to move forward. In this case, >> > that will also resolve whether or not it can be committed to trunk. >> > >> > If this logic is unpersuasive, then we can require a 2/3 majority to >> > replace the codebase. Either way, the PMC will vote to define the >> > consensus view when it is not emergent. >> > >> > > This issue has good visibility and enough discussion. >> > >> > Yes, it has. We always prefer consensus to voting, but when discussion >> > reveals that complete consensus is impossible, we still need a way >> > forward. This is rare, and usually reserved for significant changes >> > (like merging YARN). Frankly, it's embarrassing to resort to it here, >> > but here we are. >> > >> > > If there is a binding veto in effect then the change must be >> > > abandoned. >> > Else you should be able to proceed with committing. However, 3.0.0 must >> > be >> > called out as an abandoned release if we commit it. >> > >> > This is not accurate. A binding veto from any committer halts >> > progress, but the PMC sets the direction of the project. That includes >> > making decisions that are not universally accepted. -C >> > >> > > On 2/1/18, 3:01 PM, "Lei Xu" wrote: >> > > >> > > Sounds good to me, ATM. >> > > >> > > On Thu, Feb 1, 2018 at 2:34 PM, Aaron T. Myers >> > wrote: >> > > > Hey Anu, >> > > > >> > > > My feeling on HDFS-12990 is that we've discussed it quite a bit >> > already and >> > > > it doesn't seem at this point like either side is going to >> > > budge. >> > I'm >> > > > certainly happy to have a phone call about it, but I don't >> > > expect >> > that we'd >> > > > make much progress. >> > > > >> > > > My suggestion is that we simply include the patch posted to >> > HDFS-12990 in >> > > > the 3.0.1 RC and call this issue out clearly in the subsequent >> > VOTE thread >> > > > for the 3.0.1 release. Eddy, are you up for that? >> > > > >> > > > Best, >> > > > Aaron >> > > > >> > > > On Thu, Feb 1, 2018 at 1:13 PM, Lei Xu wrote: >> > > >> >> > > >> +Xiao >> > > >> >> > > >> My understanding is that we will have this for 3.0.1. Xiao, >> > could >> > > >> you give your inputs here? >> > > >> >> > > >> On Thu, Feb 1, 2018 at 11:55 AM, Anu Engineer < >> > aengin...@hortonworks.com> >> > > >> wrote: >> > > >> > Hi Eddy, >> > > >> > >> > > >> > Thanks for driving this release. Just a quick question, do we >> > have time >> > > >> > to close this issue? >> > > >> > https://issues.apache.org/jira/browse/HDFS-12990 >> > > >> > >> > > >> > or are we abandoning it? I believe that this is the last >> > > window >> > for us >> > > >> > to fix this issue. >> > > >> > >> > > >> > Should we have a call and get this resolved one way or >> > > another? >> > > >> > >> > > >> > Thanks >> > > >> > Anu >> > > >> > >> > > >> > On 2/1/18, 10:51 AM, "Lei Xu" wrote: >> > > >> > >> > > >> > Hi, All >> > > >> > >> > > >> > I just cut branch-3.0.1 from branch-3.0. Please make >> > > sure >> > all >> > > >> > patches >> > > >> > targeted to 3.0.1 being checked in both branch-3.0 and >> > branch-3.0.1. >> > > >> > >> > > >> > Thanks! >> > > >> > Eddy >> > > >> > >> > > >> > On Tue, Jan 9, 2018 at 11:17 AM, Lei Xu >> > > >> > wrote: >> > > >> > > Hi, All >> > > >> > > >> > > >> > > We have released Apache Hadoop 3.0.0 in December [1]. >> > > To >> > further >> > > >> > > improve the quality of release, we plan to cut >> > branch-3.0.1 branch >> > > >> > > tomorrow for the preparation of Apache Hadoop 3.0.1 >> > release. The >> > > >> > focus >> > > >> > > of 3.0.1 will be fixing blockers (3), critical bugs (1) >> > and bug >> > > >> > fixes >> > > >> > > [2]. No new features and
[jira] [Created] (HDFS-13125) Improve efficiency of JN -> Standby Pipeline Under Frequent Edit Tailing
Erik Krogen created HDFS-13125: -- Summary: Improve efficiency of JN -> Standby Pipeline Under Frequent Edit Tailing Key: HDFS-13125 URL: https://issues.apache.org/jira/browse/HDFS-13125 Project: Hadoop HDFS Issue Type: Improvement Components: journal-node, namenode Reporter: Erik Krogen Assignee: Erik Krogen The current edit tailing pipeline is designed for * High resiliency * High throughput and was _not_ designed for low latency. It was designed under the assumption that each edit log segment would typically be read all at once, e.g. on startup or the SbNN tailing the entire thing after it is finalized. The ObserverNode should be reading constantly from the JournalNodes' in-progress edit logs with low latency, to reduce the lag time from when a transaction is committed on the ANN and when it is visible on the ObserverNode. Due to the critical nature of this pipeline to the health of HDFS, it would be better not to redesign it altogether. Based on some experiments it seems if we mitigate the following issues, lag times are reduced to low levels (low hundreds of milliseconds even under very high write load): * The overhead of creating a new HTTP connection for each time new edits are fetched. This makes sense when you're expecting to tail an entire segment; it does not when you may only be fetching a small number of edits. We can mitigate this by allowing edits to be tailed via an RPC call, or by adding a connection pool for the existing connections to the journal. * The overhead of transmitting a whole file at once. Right now when an edit segment is requested, the JN sends the entire segment, and on the SbNN it will ignore edits up to the ones it wants. How to solve this one may be more tricky, but one suggestion would be to keep recently logged edits in memory, avoiding the need to serve them from file at all, allowing the JN to quickly serve only the required edits. We can implement these as optimizations on top of the existing logic, with fallbacks to the current slow-but-resilient pipeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[CFP] Dataworks Summit San Jose Call for Presentations closes Friday 9 Feb
All, Dataworks Summit San Jose 2018 is June 17-21. The call for abstracts is open through February 9th. One of the tracks is Big Compute And Storage, which is great for talks about Hadoop. You can submit an abstract at https://dataworkssummit.com/san-jose-2018/ Thanks, Owen
Announcing Dynamometer Open Sourcing for HDFS Scale Testing
Hey folks, We at LinkedIn have been working for a while on a scale testing and performance evaluation tool for HDFS and particularly the NameNode, which we call Dynamometer. It is now open source: you can view it on our GitHub page [1], and read about our motivations and design in our blog post [2]. The Dynamometer framework sets up a NameNode and DataNodes inside of YARN containers to create a full-scale HDFS cluster, just without any actual data, and then starts a MapReduce job which is used to replay audit log traces to generate realistic load. We’ve been using this internally for quite a while and have found it to be very useful for verifying changes before they go live on our production clusters, quantifying the performance of releases, and investigating the performance implications of potential patches. We hope that you will all find it useful as well and invite your contributions and feedback. Thanks, Erik Krogen HDFS @ LinkedIn [1]: https://github.com/linkedin/dynamometer [2]: https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Re: Apache Hadoop 3.0.1 Release plan
HADOOP-14060 is a blocker. Daryn will add more detail to the jira or to this thread. On Thu, Feb 8, 2018 at 7:01 AM, Brahma Reddy Battulawrote: > Hi Eddy, > > HDFS-12990 got committed to 3.0.1,can we have RC for 3.0.1 (only YARN-5742 > blocker is open ) ? > > > On Sat, Feb 3, 2018 at 12:40 AM, Chris Douglas > wrote: > > > On Fri, Feb 2, 2018 at 10:22 AM, Arpit Agarwal > > > wrote: > > > Do you plan to roll an RC with an uncommitted fix? That isn't the right > > approach. > > > > The fix will be committed to the release branch. We'll vote on the > > release, and if it receives a majority of +1 votes then it becomes > > 3.0.1. That's how the PMC decides how to move forward. In this case, > > that will also resolve whether or not it can be committed to trunk. > > > > If this logic is unpersuasive, then we can require a 2/3 majority to > > replace the codebase. Either way, the PMC will vote to define the > > consensus view when it is not emergent. > > > > > This issue has good visibility and enough discussion. > > > > Yes, it has. We always prefer consensus to voting, but when discussion > > reveals that complete consensus is impossible, we still need a way > > forward. This is rare, and usually reserved for significant changes > > (like merging YARN). Frankly, it's embarrassing to resort to it here, > > but here we are. > > > > > If there is a binding veto in effect then the change must be abandoned. > > Else you should be able to proceed with committing. However, 3.0.0 must > be > > called out as an abandoned release if we commit it. > > > > This is not accurate. A binding veto from any committer halts > > progress, but the PMC sets the direction of the project. That includes > > making decisions that are not universally accepted. -C > > > > > On 2/1/18, 3:01 PM, "Lei Xu" wrote: > > > > > > Sounds good to me, ATM. > > > > > > On Thu, Feb 1, 2018 at 2:34 PM, Aaron T. Myers > > wrote: > > > > Hey Anu, > > > > > > > > My feeling on HDFS-12990 is that we've discussed it quite a bit > > already and > > > > it doesn't seem at this point like either side is going to budge. > > I'm > > > > certainly happy to have a phone call about it, but I don't expect > > that we'd > > > > make much progress. > > > > > > > > My suggestion is that we simply include the patch posted to > > HDFS-12990 in > > > > the 3.0.1 RC and call this issue out clearly in the subsequent > > VOTE thread > > > > for the 3.0.1 release. Eddy, are you up for that? > > > > > > > > Best, > > > > Aaron > > > > > > > > On Thu, Feb 1, 2018 at 1:13 PM, Lei Xu wrote: > > > >> > > > >> +Xiao > > > >> > > > >> My understanding is that we will have this for 3.0.1. Xiao, > > could > > > >> you give your inputs here? > > > >> > > > >> On Thu, Feb 1, 2018 at 11:55 AM, Anu Engineer < > > aengin...@hortonworks.com> > > > >> wrote: > > > >> > Hi Eddy, > > > >> > > > > >> > Thanks for driving this release. Just a quick question, do we > > have time > > > >> > to close this issue? > > > >> > https://issues.apache.org/jira/browse/HDFS-12990 > > > >> > > > > >> > or are we abandoning it? I believe that this is the last > window > > for us > > > >> > to fix this issue. > > > >> > > > > >> > Should we have a call and get this resolved one way or > another? > > > >> > > > > >> > Thanks > > > >> > Anu > > > >> > > > > >> > On 2/1/18, 10:51 AM, "Lei Xu" wrote: > > > >> > > > > >> > Hi, All > > > >> > > > > >> > I just cut branch-3.0.1 from branch-3.0. Please make sure > > all > > > >> > patches > > > >> > targeted to 3.0.1 being checked in both branch-3.0 and > > branch-3.0.1. > > > >> > > > > >> > Thanks! > > > >> > Eddy > > > >> > > > > >> > On Tue, Jan 9, 2018 at 11:17 AM, Lei Xu > > > wrote: > > > >> > > Hi, All > > > >> > > > > > >> > > We have released Apache Hadoop 3.0.0 in December [1]. To > > further > > > >> > > improve the quality of release, we plan to cut > > branch-3.0.1 branch > > > >> > > tomorrow for the preparation of Apache Hadoop 3.0.1 > > release. The > > > >> > focus > > > >> > > of 3.0.1 will be fixing blockers (3), critical bugs (1) > > and bug > > > >> > fixes > > > >> > > [2]. No new features and improvement should be > included. > > > >> > > > > > >> > > We plan to cut branch-3.0.1 tomorrow (Jan 10th) and vote > > for RC on > > > >> > Feb > > > >> > > 1st, targeting for Feb 9th release. > > > >> > > > > > >> > > Please feel free to share your insights. > > > >> > > > > > >> > > [1] > > > >> >
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/ [Feb 7, 2018 6:54:53 AM] (xiao) HDFS-12933. Improve logging when DFSStripedOutputStream failed to write [Feb 7, 2018 3:09:45 PM] (billie) YARN-7516. Add security check for trusted docker images. Contributed by [Feb 7, 2018 3:17:00 PM] (billie) Revert "YARN-6078. Containers stuck in Localizing state. Contributed by [Feb 7, 2018 5:40:33 PM] (brahma) HDFS-12935. Get ambiguous result for DFSAdmin command in HA mode when [Feb 7, 2018 7:09:08 PM] (jlowe) YARN-7815. Make the YARN mounts added to Docker containers more [Feb 7, 2018 7:22:36 PM] (jitendra) HDFS-11701. NPE from Unresolved Host causes permanent DFSInputStream [Feb 7, 2018 8:58:09 PM] (yzhang) HDFS-13115. In getNumUnderConstructionBlocks(), ignore the inodeIds for -1 overall The following subsystems voted -1: findbugs mvnsite unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api org.apache.hadoop.yarn.api.records.Resource.getResources() may expose internal representation by returning Resource.resources At Resource.java:by returning Resource.resources At Resource.java:[line 234] Failed junit tests : hadoop.hdfs.TestDFSStripedOutputStreamWithFailure hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy hadoop.hdfs.TestDFSStripedOutputStreamWithFailure210 hadoop.hdfs.TestDFSStripedOutputStream hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage hadoop.yarn.client.api.impl.TestAMRMClientPlacementConstraints hadoop.yarn.client.api.impl.TestAMRMClient hadoop.mapreduce.v2.TestMRJobs hadoop.mapreduce.v2.TestUberAM cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-compile-javac-root.txt [280K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-checkstyle-root.txt [17M] mvnsite: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-mvnsite-root.txt [112K] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-patch-shelldocs.txt [60K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/whitespace-eol.txt [9.2M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/whitespace-tabs.txt [292K] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/xml.txt [8.0K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-javadoc-javadoc-root.txt [760K] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [316K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt [48K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt [16K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [88K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt [8.0K] Powered by Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Resolved] (HDFS-11360) HDFS balancer need a config to appoint to a decided nameservice
[ https://issues.apache.org/jira/browse/HDFS-11360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lantao Jin resolved HDFS-11360. --- Resolution: Duplicate Assignee: Lantao Jin > HDFS balancer need a config to appoint to a decided nameservice > --- > > Key: HDFS-11360 > URL: https://issues.apache.org/jira/browse/HDFS-11360 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.7.1 >Reporter: Lantao Jin >Assignee: Lantao Jin >Priority: Minor > > With distcp configuration, there are two or more "nameservices" setting in > hdfs-site.xml, for example: > {code} > > dfs.nameservices > one-nn-ha,two-nn-ha > > {code} > If the HDFS Balancer also launches in that node, it will create two IPC > threads to connect both of NNs. And block removing both happens in this > Balancer. Although I didn't find any block removing between different > clusters, the behavior still became weird and unexpected. > So the best way is adding a configuration that appoint to a decided > nameservice to launch. I can offer a patch. Any better ideal? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Why always allocate shm slot when local read even if no zero copy needed?
Hello, It seems that we always allocate shm slot when local read, even if we only do the SC read without zero copy. Can we save this if no zero copy needed? According to my understanding, the shm slot is not used if we don't do the zero copy when local read, is that right? public ShortCircuitReplica(ExtendedBlockId key, FileInputStream dataStream, FileInputStream metaStream, ShortCircuitCache cache, long creationTimeMs, Slot slot) throws IOException { -- Xie Gang