[jira] [Created] (HDFS-13128) HDFS balancer in single node cluster fails with "Another Balancer is running.."

2018-02-08 Thread Zbigniew Kostrzewa (JIRA)
Zbigniew Kostrzewa created HDFS-13128:
-

 Summary: HDFS balancer in single node cluster fails with "Another 
Balancer is running.."
 Key: HDFS-13128
 URL: https://issues.apache.org/jira/browse/HDFS-13128
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover, hdfs
Affects Versions: 2.7.3
Reporter: Zbigniew Kostrzewa


In a single node "cluster", HDFS balancer fails with:
{noformat}
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
java.io.IOException: Another Balancer is running.. Exiting ...
{noformat}
and in Name Node logs there is:
{noformat}
2018-02-09 07:23:21,671 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocate blk_1073741865_1041{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
primaryNodeIndex=-1, replicas=[ReplicaUC[[
DISK]DS-dae233d3-5c71-498e-9a8b-669bff3fccdf:NORMAL:10.9.4.184:30010|RBW]]} for 
/system/balancer.id
2018-02-09 07:23:21,739 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: 
/system/balancer.id for DFSClient_NONMAPREDUCE_-1126407107_1
2018-02-09 07:23:21,758 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.append: Failed to APPEND_FILE /system/balancer.id for 
DFSClient_NONMAPREDUCE_1275100437_1 on 10.9.4.184 becaus
e this file lease is currently owned by DFSClient_NONMAPREDUCE_-1126407107_1 on 
10.9.4.184
2018-02-09 07:23:21,758 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 
on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.append from 
10.9.4.184:49781 Call#12 Retry#0: org.
apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: Failed to APPEND_FILE 
/system/balancer.id for DFSClient_NONMAPREDUCE_1275100437_1 on 10.9.4.184 
because this file lease is currently 
owned by DFSClient_NONMAPREDUCE_-1126407107_1 on 10.9.4.184
2018-02-09 07:23:21,773 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.9.4.184:30010 is added to 
blk_1073741865_1041{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primar
yNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-dae233d3-5c71-498e-9a8b-669bff3fccdf:NORMAL:10.9.4.184:30010|RBW]]}
 size 15
2018-02-09 07:23:21,776 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: /system/balancer.id is closed by 
DFSClient_NONMAPREDUCE_-1126407107_1{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13127) Fix TestContainerStateManager and TestOzoneConfigurationFields

2018-02-08 Thread Mukul Kumar Singh (JIRA)
Mukul Kumar Singh created HDFS-13127:


 Summary: Fix TestContainerStateManager and 
TestOzoneConfigurationFields
 Key: HDFS-13127
 URL: https://issues.apache.org/jira/browse/HDFS-13127
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh
 Fix For: HDFS-7240


TestContainerStateManager is failing because SCM is unable to find a container 
with enough free space to allocate a new block in the container.

TestOzoneConfigurationFields is failing because configs "ozone.rest.servers"  
and "ozone.rest.client.port" are added in ozone-default.xml however they aren't 
specified as any of the config keys.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



回复:[DISCUSS] Meetup for HDFS tests and build infra

2018-02-08 Thread 郑锴(铁杰)
Thanks Chris for driving on this.
>>I'm looking at you, TestDFSStripedOutputStreamWithFailure ...AFAIK and IMO, 
>>it's pretty hard to get all the test cases stably running given the 
>>limitation of MiniDFSCluster, and if we'd agree on that, we could remove 
>>these cases as unit tests and cover them in integration tests instead using a 
>>true cluster, like based on k8s infra. We're lacking basic facility infra env 
>>and tools to get most of the complicated functionalities well tested and 
>>covered, so let's avoid too much complicated tests. Fixing of such tests 
>>should definitely help and be appreciated. 
Regards,Kai
--发件人:Chris 
Douglas 发送时间:2018年2月8日(星期四) 08:39收件人:Hdfs-dev 
主 题:Re: [DISCUSS] Meetup for HDFS tests and build 
infra
Created a poll [1] to inform scheduling. -C

[1]: https://doodle.com/poll/r22znitzae9apfbf

On Tue, Feb 6, 2018 at 3:09 PM, Chris Douglas  wrote:
> The HDFS build is not healthy. Many of the unit tests aren't actually
> run in Jenkins due to resource exhaustion, haven't been updated since
> build/test/data was the test temp dir, or are chronically unstable
> (I'm looking at you, TestDFSStripedOutputStreamWithFailure). The
> situation has deteriorated slowly, but we can't confidently merge
> patches, let alone significant features, when our CI infra is in this
> state.
>
> How would folks feel about a half to full-day meetup to work through
> patches improving this, specifically? We can improve tests,
> troubleshoot the build, and rev/commit existing patches. It would
> require some preparation, so the simultaneous attention is productive
> and not a coordination bottleneck. I started a wiki page for this [1],
> please add to it.
>
> If enough people can make time for this, say in 2-3 weeks, the project
> would certainly benefit. -C
>
> [1]: https://s.apache.org/ng3C

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-13122) Tailing edits should not update quota counts on ObserverNode

2018-02-08 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen resolved HDFS-13122.

Resolution: Duplicate

> Tailing edits should not update quota counts on ObserverNode
> 
>
> Key: HDFS-13122
> URL: https://issues.apache.org/jira/browse/HDFS-13122
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> Currently in {{FSImage#loadEdits()}}, after applying a set of edits, we call
> {code}
> updateCountForQuota(target.getBlockManager().getStoragePolicySuite(), 
> target.dir.rootDir);
> {code}
> to update the quota counts for the entire namespace, which can be very 
> expensive. This makes sense if we are about to become the ANN, since we need 
> valid quotas, but not on an ObserverNode which does not need to enforce 
> quotas.
> This is related to increasing the frequency with which the SbNN can tail 
> edits from the ANN to decrease the lag time for transactions to appear on the 
> Observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-08 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reopened HDFS-13120:
---

This breaks 2.8 compilation. Reverting...

> Snapshot diff could be corrupted after concat
> -
>
> Key: HDFS-13120
> URL: https://issues.apache.org/jira/browse/HDFS-13120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Affects Versions: 2.7.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6
>
> Attachments: HDFS-13120.001.patch, HDFS-13120.002.patch
>
>
> The snapshot diff can be corrupted after concat files. This could lead to 
> Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 
> For example, we have seen customers hit stack trace similar to the one below 
> but during loading edit entry of DeleteSnapshotOp. After the investigation, 
> we found this is a regression caused by HDFS-3689 where the snapshot diff is 
> not fully cleaned up after concat. 
> I will post the unit test to repro this and fix for it shortly.
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
> already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
>   at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
>   at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
>   at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2018-02-08 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/130/

[Feb 7, 2018 4:58:28 PM] (billie) Revert "YARN-6078. Containers stuck in 
Localizing state. Contributed by
[Feb 7, 2018 11:35:41 PM] (yzhang) HDFS-13115. In 
getNumUnderConstructionBlocks(), ignore the inodeIds for




-1 overall


The following subsystems voted -1:
asflicense unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Unreaped Processes :

   hadoop-common:1 
   hadoop-hdfs:24 
   bkjournal:5 
   hadoop-yarn-common:1 
   hadoop-yarn-server-nodemanager:1 
   hadoop-yarn-server-timelineservice:1 
   hadoop-yarn-client:4 
   hadoop-yarn-applications-distributedshell:1 
   hadoop-mapreduce-client-app:1 
   hadoop-mapreduce-client-jobclient:14 
   hadoop-distcp:2 
   hadoop-extras:1 

Failed junit tests :

   hadoop.hdfs.TestBlocksScheduledCounter 
   hadoop.hdfs.TestDFSClientFailover 
   hadoop.hdfs.web.TestHttpsFileSystem 
   hadoop.hdfs.TestSetTimes 
   hadoop.hdfs.TestDatanodeRegistration 
   hadoop.hdfs.web.TestWebHdfsFileSystemContract 
   hadoop.hdfs.web.TestWebHDFSAcl 
   hadoop.hdfs.TestDatanodeReport 
   hadoop.hdfs.TestMiniDFSCluster 
   hadoop.hdfs.web.TestHftpFileSystem 
   hadoop.hdfs.TestDFSClientRetries 
   hadoop.yarn.server.nodemanager.webapp.TestNMWebServer 
   
hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime
 
   hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels 
   hadoop.yarn.server.nodemanager.TestNodeStatusUpdater 
   hadoop.mapred.TestJavaSerialization 
   hadoop.mapreduce.TestMRJobClient 
   hadoop.mapred.TestClientRedirect 
   hadoop.mapred.TestMapProgress 
   hadoop.mapred.TestReduceFetch 
   hadoop.mapreduce.security.ssl.TestEncryptedShuffle 
   hadoop.mapred.TestLocalJobSubmission 
   hadoop.mapreduce.security.TestBinaryTokenFile 
   hadoop.mapreduce.security.TestJHSSecurity 
   hadoop.fs.TestFileSystem 
   hadoop.mapreduce.TestChild 
   hadoop.mapreduce.security.TestMRCredentials 
   hadoop.conf.TestNoDefaultsJobConf 
   hadoop.fs.TestDFSIO 
   hadoop.mapred.TestJobSysDirWithDFS 
   hadoop.tools.TestIntegration 
   hadoop.tools.TestDistCpViewFs 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.resourceestimator.service.TestResourceEstimatorService 

Timed out junit tests :

   org.apache.hadoop.log.TestLogLevel 
   org.apache.hadoop.hdfs.TestLeaseRecovery2 
   org.apache.hadoop.security.TestPermission 
   org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream 
   org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade 
   org.apache.hadoop.hdfs.TestFileAppendRestart 
   org.apache.hadoop.hdfs.TestReadWhileWriting 
   org.apache.hadoop.hdfs.security.TestDelegationToken 
   org.apache.hadoop.security.TestPermissionSymlinks 
   org.apache.hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter 
   org.apache.hadoop.hdfs.TestDFSMkdirs 
   org.apache.hadoop.hdfs.TestDFSOutputStream 
   org.apache.hadoop.hdfs.web.TestWebHDFS 
   org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs 
   org.apache.hadoop.hdfs.web.TestWebHDFSXAttr 
   org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes 
   org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs 
   org.apache.hadoop.hdfs.TestDistributedFileSystem 
   org.apache.hadoop.hdfs.web.TestWebHDFSForHA 
   org.apache.hadoop.hdfs.TestReplaceDatanodeFailureReplication 
   org.apache.hadoop.hdfs.TestDFSShell 
   org.apache.hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperJournalManager 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperSpeculativeRead 
   org.apache.hadoop.yarn.webapp.TestWebApp 
   org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot 
   
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServices
 
   org.apache.hadoop.yarn.client.TestRMFailover 
   org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA 
   org.apache.hadoop.yarn.client.api.impl.TestYarnClientWithReservation 
   org.apache.hadoop.yarn.client.api.impl.TestAMRMClient 
   
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell 
   org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup 
   org.apache.hadoop.mapred.lib.TestDelegatingInputFormat 
   

[jira] [Created] (HDFS-13126) Re-enable HTTP Request Logging for WebHDFS

2018-02-08 Thread Erik Krogen (JIRA)
Erik Krogen created HDFS-13126:
--

 Summary: Re-enable HTTP Request Logging for WebHDFS
 Key: HDFS-13126
 URL: https://issues.apache.org/jira/browse/HDFS-13126
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, webhdfs
Affects Versions: 2.7.0
Reporter: Erik Krogen


Due to HDFS-7279, starting in 2.7.0, the DataNode HTTP Request logs no longer 
include WebHDFS requests because the HTTP Request handling is done internal to 
{{HttpServer2}}, which is no longer used. If the request logging is enabled, we 
should add a Netty 
[LoggingHandler|https://netty.io/4.0/api/io/netty/handler/logging/LoggingHandler.html]
 to the ChannelPipeline for the http(s) servers used by the DataNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Apache Hadoop 3.0.1 Release plan

2018-02-08 Thread Lei Xu
Hi, Brahma

Thanks for reminder. YARN-5742 does not look like a blocker to me. I
will create a RC right after HADOOP-14060.

On Thu, Feb 8, 2018 at 7:35 AM, Kihwal Lee  wrote:
> HADOOP-14060 is a blocker.  Daryn will add more detail to the jira or to
> this thread.
>
> On Thu, Feb 8, 2018 at 7:01 AM, Brahma Reddy Battula 
> wrote:
>>
>> Hi Eddy,
>>
>> HDFS-12990 got committed to 3.0.1,can we have RC for 3.0.1 (only
>> YARN-5742
>> blocker is open )  ?
>>
>>
>> On Sat, Feb 3, 2018 at 12:40 AM, Chris Douglas 
>> wrote:
>>
>> > On Fri, Feb 2, 2018 at 10:22 AM, Arpit Agarwal
>> > 
>> > wrote:
>> > > Do you plan to roll an RC with an uncommitted fix? That isn't the
>> > > right
>> > approach.
>> >
>> > The fix will be committed to the release branch. We'll vote on the
>> > release, and if it receives a majority of +1 votes then it becomes
>> > 3.0.1. That's how the PMC decides how to move forward. In this case,
>> > that will also resolve whether or not it can be committed to trunk.
>> >
>> > If this logic is unpersuasive, then we can require a 2/3 majority to
>> > replace the codebase. Either way, the PMC will vote to define the
>> > consensus view when it is not emergent.
>> >
>> > > This issue has good visibility and enough discussion.
>> >
>> > Yes, it has. We always prefer consensus to voting, but when discussion
>> > reveals that complete consensus is impossible, we still need a way
>> > forward. This is rare, and usually reserved for significant changes
>> > (like merging YARN). Frankly, it's embarrassing to resort to it here,
>> > but here we are.
>> >
>> > > If there is a binding veto in effect then the change must be
>> > > abandoned.
>> > Else you should be able to proceed with committing. However, 3.0.0 must
>> > be
>> > called out as an abandoned release if we commit it.
>> >
>> > This is not accurate. A binding veto from any committer halts
>> > progress, but the PMC sets the direction of the project. That includes
>> > making decisions that are not universally accepted. -C
>> >
>> > > On 2/1/18, 3:01 PM, "Lei Xu"  wrote:
>> > >
>> > > Sounds good to me, ATM.
>> > >
>> > > On Thu, Feb 1, 2018 at 2:34 PM, Aaron T. Myers 
>> > wrote:
>> > > > Hey Anu,
>> > > >
>> > > > My feeling on HDFS-12990 is that we've discussed it quite a bit
>> > already and
>> > > > it doesn't seem at this point like either side is going to
>> > > budge.
>> > I'm
>> > > > certainly happy to have a phone call about it, but I don't
>> > > expect
>> > that we'd
>> > > > make much progress.
>> > > >
>> > > > My suggestion is that we simply include the patch posted to
>> > HDFS-12990 in
>> > > > the 3.0.1 RC and call this issue out clearly in the subsequent
>> > VOTE thread
>> > > > for the 3.0.1 release. Eddy, are you up for that?
>> > > >
>> > > > Best,
>> > > > Aaron
>> > > >
>> > > > On Thu, Feb 1, 2018 at 1:13 PM, Lei Xu  wrote:
>> > > >>
>> > > >> +Xiao
>> > > >>
>> > > >> My understanding is that we will have this for 3.0.1.   Xiao,
>> > could
>> > > >> you give your inputs here?
>> > > >>
>> > > >> On Thu, Feb 1, 2018 at 11:55 AM, Anu Engineer <
>> > aengin...@hortonworks.com>
>> > > >> wrote:
>> > > >> > Hi Eddy,
>> > > >> >
>> > > >> > Thanks for driving this release. Just a quick question, do we
>> > have time
>> > > >> > to close this issue?
>> > > >> > https://issues.apache.org/jira/browse/HDFS-12990
>> > > >> >
>> > > >> > or are we abandoning it? I believe that this is the last
>> > > window
>> > for us
>> > > >> > to fix this issue.
>> > > >> >
>> > > >> > Should we have a call and get this resolved one way or
>> > > another?
>> > > >> >
>> > > >> > Thanks
>> > > >> > Anu
>> > > >> >
>> > > >> > On 2/1/18, 10:51 AM, "Lei Xu"  wrote:
>> > > >> >
>> > > >> > Hi, All
>> > > >> >
>> > > >> > I just cut branch-3.0.1 from branch-3.0.  Please make
>> > > sure
>> > all
>> > > >> > patches
>> > > >> > targeted to 3.0.1 being checked in both branch-3.0 and
>> > branch-3.0.1.
>> > > >> >
>> > > >> > Thanks!
>> > > >> > Eddy
>> > > >> >
>> > > >> > On Tue, Jan 9, 2018 at 11:17 AM, Lei Xu
>> > > 
>> > wrote:
>> > > >> > > Hi, All
>> > > >> > >
>> > > >> > > We have released Apache Hadoop 3.0.0 in December [1].
>> > > To
>> > further
>> > > >> > > improve the quality of release, we plan to cut
>> > branch-3.0.1 branch
>> > > >> > > tomorrow for the preparation of Apache Hadoop 3.0.1
>> > release. The
>> > > >> > focus
>> > > >> > > of 3.0.1 will be fixing blockers (3), critical bugs (1)
>> > and bug
>> > > >> > fixes
>> > > >> > > [2].  No new features and 

[jira] [Created] (HDFS-13125) Improve efficiency of JN -> Standby Pipeline Under Frequent Edit Tailing

2018-02-08 Thread Erik Krogen (JIRA)
Erik Krogen created HDFS-13125:
--

 Summary: Improve efficiency of JN -> Standby Pipeline Under 
Frequent Edit Tailing
 Key: HDFS-13125
 URL: https://issues.apache.org/jira/browse/HDFS-13125
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node, namenode
Reporter: Erik Krogen
Assignee: Erik Krogen


The current edit tailing pipeline is designed for
* High resiliency
* High throughput
and was _not_ designed for low latency.

It was designed under the assumption that each edit log segment would typically 
be read all at once, e.g. on startup or the SbNN tailing the entire thing after 
it is finalized. The ObserverNode should be reading constantly from the 
JournalNodes' in-progress edit logs with low latency, to reduce the lag time 
from when a transaction is committed on the ANN and when it is visible on the 
ObserverNode.

Due to the critical nature of this pipeline to the health of HDFS, it would be 
better not to redesign it altogether. Based on some experiments it seems if we 
mitigate the following issues, lag times are reduced to low levels (low 
hundreds of milliseconds even under very high write load):
* The overhead of creating a new HTTP connection for each time new edits are 
fetched. This makes sense when you're expecting to tail an entire segment; it 
does not when you may only be fetching a small number of edits. We can mitigate 
this by allowing edits to be tailed via an RPC call, or by adding a connection 
pool for the existing connections to the journal.
* The overhead of transmitting a whole file at once. Right now when an edit 
segment is requested, the JN sends the entire segment, and on the SbNN it will 
ignore edits up to the ones it wants. How to solve this one may be more tricky, 
but one suggestion would be to keep recently logged edits in memory, avoiding 
the need to serve them from file at all, allowing the JN to quickly serve only 
the required edits.

We can implement these as optimizations on top of the existing logic, with 
fallbacks to the current slow-but-resilient pipeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[CFP] Dataworks Summit San Jose Call for Presentations closes Friday 9 Feb

2018-02-08 Thread Owen O'Malley
All,

Dataworks Summit San Jose 2018 is June 17-21.   The call for abstracts is
open through February 9th.  One of the tracks is Big Compute And Storage,
which is great for talks about Hadoop.  You can submit an abstract at
https://dataworkssummit.com/san-jose-2018/

Thanks,
   Owen


Announcing Dynamometer Open Sourcing for HDFS Scale Testing

2018-02-08 Thread Erik Krogen
Hey folks,

We at LinkedIn have been working for a while on a scale testing and performance 
evaluation tool for HDFS and particularly the NameNode, which we call 
Dynamometer. It is now open source: you can view it on our GitHub page [1], and 
read about our motivations and design in our blog post [2]. The Dynamometer 
framework sets up a NameNode and DataNodes inside of YARN containers to create 
a full-scale HDFS cluster, just without any actual data, and then starts a 
MapReduce job which is used to replay audit log traces to generate realistic 
load. We’ve been using this internally for quite a while and have found it to 
be very useful for verifying changes before they go live on our production 
clusters, quantifying the performance of releases, and investigating the 
performance implications of potential patches. We hope that you will all find 
it useful as well and invite your contributions and feedback.

Thanks,
Erik Krogen
HDFS @ LinkedIn

[1]: https://github.com/linkedin/dynamometer
[2]: 
https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Apache Hadoop 3.0.1 Release plan

2018-02-08 Thread Kihwal Lee
HADOOP-14060 is a blocker.  Daryn will add more detail to the jira or to
this thread.

On Thu, Feb 8, 2018 at 7:01 AM, Brahma Reddy Battula 
wrote:

> Hi Eddy,
>
> HDFS-12990 got committed to 3.0.1,can we have RC for 3.0.1 (only  YARN-5742
> blocker is open )  ?
>
>
> On Sat, Feb 3, 2018 at 12:40 AM, Chris Douglas 
> wrote:
>
> > On Fri, Feb 2, 2018 at 10:22 AM, Arpit Agarwal  >
> > wrote:
> > > Do you plan to roll an RC with an uncommitted fix? That isn't the right
> > approach.
> >
> > The fix will be committed to the release branch. We'll vote on the
> > release, and if it receives a majority of +1 votes then it becomes
> > 3.0.1. That's how the PMC decides how to move forward. In this case,
> > that will also resolve whether or not it can be committed to trunk.
> >
> > If this logic is unpersuasive, then we can require a 2/3 majority to
> > replace the codebase. Either way, the PMC will vote to define the
> > consensus view when it is not emergent.
> >
> > > This issue has good visibility and enough discussion.
> >
> > Yes, it has. We always prefer consensus to voting, but when discussion
> > reveals that complete consensus is impossible, we still need a way
> > forward. This is rare, and usually reserved for significant changes
> > (like merging YARN). Frankly, it's embarrassing to resort to it here,
> > but here we are.
> >
> > > If there is a binding veto in effect then the change must be abandoned.
> > Else you should be able to proceed with committing. However, 3.0.0 must
> be
> > called out as an abandoned release if we commit it.
> >
> > This is not accurate. A binding veto from any committer halts
> > progress, but the PMC sets the direction of the project. That includes
> > making decisions that are not universally accepted. -C
> >
> > > On 2/1/18, 3:01 PM, "Lei Xu"  wrote:
> > >
> > > Sounds good to me, ATM.
> > >
> > > On Thu, Feb 1, 2018 at 2:34 PM, Aaron T. Myers 
> > wrote:
> > > > Hey Anu,
> > > >
> > > > My feeling on HDFS-12990 is that we've discussed it quite a bit
> > already and
> > > > it doesn't seem at this point like either side is going to budge.
> > I'm
> > > > certainly happy to have a phone call about it, but I don't expect
> > that we'd
> > > > make much progress.
> > > >
> > > > My suggestion is that we simply include the patch posted to
> > HDFS-12990 in
> > > > the 3.0.1 RC and call this issue out clearly in the subsequent
> > VOTE thread
> > > > for the 3.0.1 release. Eddy, are you up for that?
> > > >
> > > > Best,
> > > > Aaron
> > > >
> > > > On Thu, Feb 1, 2018 at 1:13 PM, Lei Xu  wrote:
> > > >>
> > > >> +Xiao
> > > >>
> > > >> My understanding is that we will have this for 3.0.1.   Xiao,
> > could
> > > >> you give your inputs here?
> > > >>
> > > >> On Thu, Feb 1, 2018 at 11:55 AM, Anu Engineer <
> > aengin...@hortonworks.com>
> > > >> wrote:
> > > >> > Hi Eddy,
> > > >> >
> > > >> > Thanks for driving this release. Just a quick question, do we
> > have time
> > > >> > to close this issue?
> > > >> > https://issues.apache.org/jira/browse/HDFS-12990
> > > >> >
> > > >> > or are we abandoning it? I believe that this is the last
> window
> > for us
> > > >> > to fix this issue.
> > > >> >
> > > >> > Should we have a call and get this resolved one way or
> another?
> > > >> >
> > > >> > Thanks
> > > >> > Anu
> > > >> >
> > > >> > On 2/1/18, 10:51 AM, "Lei Xu"  wrote:
> > > >> >
> > > >> > Hi, All
> > > >> >
> > > >> > I just cut branch-3.0.1 from branch-3.0.  Please make sure
> > all
> > > >> > patches
> > > >> > targeted to 3.0.1 being checked in both branch-3.0 and
> > branch-3.0.1.
> > > >> >
> > > >> > Thanks!
> > > >> > Eddy
> > > >> >
> > > >> > On Tue, Jan 9, 2018 at 11:17 AM, Lei Xu  >
> > wrote:
> > > >> > > Hi, All
> > > >> > >
> > > >> > > We have released Apache Hadoop 3.0.0 in December [1]. To
> > further
> > > >> > > improve the quality of release, we plan to cut
> > branch-3.0.1 branch
> > > >> > > tomorrow for the preparation of Apache Hadoop 3.0.1
> > release. The
> > > >> > focus
> > > >> > > of 3.0.1 will be fixing blockers (3), critical bugs (1)
> > and bug
> > > >> > fixes
> > > >> > > [2].  No new features and improvement should be
> included.
> > > >> > >
> > > >> > > We plan to cut branch-3.0.1 tomorrow (Jan 10th) and vote
> > for RC on
> > > >> > Feb
> > > >> > > 1st, targeting for Feb 9th release.
> > > >> > >
> > > >> > > Please feel free to share your insights.
> > > >> > >
> > > >> > > [1]
> > > >> > 

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2018-02-08 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/

[Feb 7, 2018 6:54:53 AM] (xiao) HDFS-12933. Improve logging when 
DFSStripedOutputStream failed to write
[Feb 7, 2018 3:09:45 PM] (billie) YARN-7516. Add security check for trusted 
docker images. Contributed by
[Feb 7, 2018 3:17:00 PM] (billie) Revert "YARN-6078. Containers stuck in 
Localizing state. Contributed by
[Feb 7, 2018 5:40:33 PM] (brahma) HDFS-12935. Get ambiguous result for DFSAdmin 
command in HA mode when
[Feb 7, 2018 7:09:08 PM] (jlowe) YARN-7815. Make the YARN mounts added to 
Docker containers more
[Feb 7, 2018 7:22:36 PM] (jitendra) HDFS-11701. NPE from Unresolved Host causes 
permanent DFSInputStream
[Feb 7, 2018 8:58:09 PM] (yzhang) HDFS-13115. In 
getNumUnderConstructionBlocks(), ignore the inodeIds for




-1 overall


The following subsystems voted -1:
findbugs mvnsite unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
   org.apache.hadoop.yarn.api.records.Resource.getResources() may expose 
internal representation by returning Resource.resources At Resource.java:by 
returning Resource.resources At Resource.java:[line 234] 

Failed junit tests :

   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure 
   hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 
   hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure210 
   hadoop.hdfs.TestDFSStripedOutputStream 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration 
   hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage 
   hadoop.yarn.client.api.impl.TestAMRMClientPlacementConstraints 
   hadoop.yarn.client.api.impl.TestAMRMClient 
   hadoop.mapreduce.v2.TestMRJobs 
   hadoop.mapreduce.v2.TestUberAM 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-compile-javac-root.txt
  [280K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-checkstyle-root.txt
  [17M]

   mvnsite:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-mvnsite-root.txt
  [112K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-patch-shelldocs.txt
  [60K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/whitespace-eol.txt
  [9.2M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/whitespace-tabs.txt
  [292K]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/xml.txt
  [8.0K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/diff-javadoc-javadoc-root.txt
  [760K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [316K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [48K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
  [88K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/685/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt
  [8.0K]

Powered by Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Resolved] (HDFS-11360) HDFS balancer need a config to appoint to a decided nameservice

2018-02-08 Thread Lantao Jin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin resolved HDFS-11360.
---
Resolution: Duplicate
  Assignee: Lantao Jin

> HDFS balancer need a config to appoint to a decided nameservice
> ---
>
> Key: HDFS-11360
> URL: https://issues.apache.org/jira/browse/HDFS-11360
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.7.1
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Minor
>
> With distcp configuration, there are two or more "nameservices" setting in 
> hdfs-site.xml, for example:
> {code}
> 
> dfs.nameservices
> one-nn-ha,two-nn-ha
> 
> {code}
> If the HDFS Balancer also launches in that node, it will create two IPC 
> threads to connect both of NNs. And block removing both happens in this 
> Balancer. Although I didn't find any block removing between different 
> clusters, the behavior still became weird and unexpected.
> So the best way is adding a configuration that appoint to a decided 
> nameservice to launch. I can offer a patch. Any better ideal? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Why always allocate shm slot when local read even if no zero copy needed?

2018-02-08 Thread Xie Gang
Hello,

It seems that we always allocate shm slot when local read, even if we only
do the SC read without zero copy. Can we save this if no zero copy needed?

According to my understanding, the shm slot is not used if we don't do the
zero copy when local read, is that right?

public ShortCircuitReplica(ExtendedBlockId key,
FileInputStream dataStream, FileInputStream metaStream,
ShortCircuitCache cache, long creationTimeMs, Slot slot) throws
IOException {


-- 
Xie Gang